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Message from the Conference General Co-Chairs 

This year Graphics Interface is part of the combined Al/ 
GI/VI ’92 Conference held at the University of British 
Columbia in Vancouver, British Columbia. This year is 
noteworthy for a number of reasons. In addition to being 
the second time that all three conferences have been held 
together, this is the first year that the new name of the 
society is in effect and the first year that the new confer¬ 
ence logo is being used on an on-going basis. Last year 
CMCCS became CHCCS when “Man—Computer 
Communications” was changed to “Human-Computer 
Communications” in keeping with current usage. This 
year the “eyeball” logo that was originally designed for 
Graphics Interface/Vis ion Interface ’90 was adopted as a 
permanent conference logo. 

We thank all of the people who helped to organize the 
conferences and invite all of the attendees to enjoy the 
technical program, special events and the many pleasures 
of UBC and Vancouver during their visit. 

Kellogg Booth and Alain Fournier 
Al/Gl/Vl ’92 Co-Chairs 


Message from the Programme Chair 

This year’s technical programme consists of 34 carefully 
selected papers. An increase in submissions this year 
made our job particularly difficult, but the result is an 
excellent programme. I would like to draw your attention 
to the fact that, true to the name Graphics Interface , this 
programme contains original work covering a wide area 
of computer graphics and human-computer interaction. 
The programme committee was formed with this in 
mind, and it performed extremely well in the selection of 
the most desirable papers from among the many good 
submissions. I am very grateful to them for making my 
job easier. 

To ensure that each paper was reviewed by three ref¬ 
erees, many additional referees were recruited with 
specific research interests that are related to one or more 
of the submissions. The quality of refereeing was 
excellent, and I would like to thank these referees for 
their contributions to our conference. The names of both 
the members of the programme committee and of the 
additional referees are listed below. 

This year’s conference has seen an increase in participa¬ 
tion from Europe and the United States, which makes our 
conference truly international in character. I thank the 
authors of all papers for their submissions, whether their 
papers were accepted or not. I encourage all researchers 
who read this message to consider submitting a paper to 
Graphics Interface *93. 

Eugene Fiume, University of Toronto 
Graphics Interface ’92 Programme Chair 


Message des Copr£sidents general du congres 

Cette ann 6c Graphics Interface fait partie de la conference 
AI/GI/VI ’92 qui a lieu k l’universife de Colombie Britan- 
nique k Vancouver. L’ 6 v 6 nement est k remarquer pour 
plusieurs raisons. En plus d’etre la deuxi&me fois que les 
trois conferences ont lieu en meme temps, c’est la pre¬ 
miere arnfee que le nouveau nom de la socfefe parraine est 
officiel et que le nouveau logo de la conference devient 
permanent. L’amfee demfere la Socfefe canadienne du 
dialogue homme-machine est devenue la Socfefe cana¬ 
dienne du dialogue humain-machine (nom qui a l’avan- 
tage sur la version anglaise de n’entrainer aucun ehange- 
ment de sigle). Le logo du “globe oculaire” qui avait 6x6 
con 9 u initialement pour GI/VI ’90 a 6x6 aussi adopfe cette 
annde de fa 9 on permanente. 

Nous remercions chaleureusement tous les volontaires qui 
ont particip£s k l’organisation de la conference, et nous 
invitons tous les participants k profiter du programme de 
communications techniques, des 6 vdnements spdciaux et 
des nombreuses attractions du campus et de Vancouver 
lors de leur visite ici. 

Kellogg Booth et Alain Fournier 
AI/GI/VI *92 Co-pfesidents 

Message du president du programme 

Le programme technique de cette amfee est fait de 34 
communications soigneusement choisies. L’augmentation 
du nombre de communications soumises a rendu notre 
tache particulferement difficile cette amfee, mais 9 a fesulte 
en un excellent programme. Je voudrais souligner que 
fid&le au nom interface graphique ce programme pfesen- 
tent des travaux originaux dans une vaste gamme de sujets 
en infographie et en interface humain-machine. Le comife 
du programme a 6x6 compost en tenant compte de ce fait, 
et il s’est tfes bien acquitfe de sa tache en choisissant les 
communications les meilleures parmi de nombreuses 
soumissions de qualife. Je leur suis tfes reconnaissant 
d’avoir rendu mon travail plus facile. 

Pour assurer que chaque soumission soit considefee par 
trois arbitres experts, de nombreux arbitres suppfementai- 
res dans des aires de recherche sp£cifiques ont 6x6 recru- 
t6s. La qualife des revues a 6x6 excellente, et je voudrais 
remercier ces arbitres pour leur contribution k la confe¬ 
rence. Les noms des membres du comife et des arbitres 
suppfementaires sont domfes ci-dessous. 

Cette amfee la conference femoigne d’une augmentation 
de la participation de 1’Europe et des £tats-Unis, ce qui 
accroit son characfere international. Je remercie tous les 
auteurs des communications soumises, qu’elles aient 6x6 
accepfees ou non. J’encourage tous les chercheurs qui 
lisent ce message k penser k soumettre une communication 
k Graphics Interface *93. 

Eugene Fiume, University of Toronto 
President du programme de Graphics Interface ’92 
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Message from the President of CHCCS 

It is with great pleasure that I welcome you to Vancouver 
and Graphics Interface ’92. This is the eighteenth in a 
series of graphics conferences started in 1969 in Ottawa 
and held across Canada in various years. It is the longest 
running graphics conference in the world and continues 
to provide a significant contribution to the field of 
computer graphics and interactive techniques. The 
Canadian Human-Computer Communications Society is 
pleased to be able to sponsor such a conference and is 
indebted to all those who have contributed time and 
effort into making it the success that it is. 

Again, following the trend established in 1986 (coinci¬ 
dentally also held in Vancouver), Graphics Interface is 
being held in conjunction with Vision Interface. Holding 
these two conferences jointly has helped to promote the 
interchange of ideas and raise the profile of both confer¬ 
ences. This year we are joined by Artificial Intelligence 
’92, the Canadian Artificial Intelligence Conference. 

It is important that the efforts of all those who have 
helped to organize these conferences and prepare the 
technical programs and proceedings be acknowledged 
for their contributions. Space does not permit us to list all 
of the many people who have helped out. A partial list 
appears in the proceedings, but there are many more 
including the student volunteers and others who have 
played significant roles behind the scenes in putting on 
this year’s conference. 

Wayne A. Davis 
President, CHCCS 


Message du president du SCDHM 

C’est avec grand plaisir que je vous souhaite la bienvenue 
k Vancouver k la Conference Graphics Interface ’92. II 
s’agit de la dix-huitfeme d’une s6rie de conferences sur 
1’infographie amorcdes en 1969 k Ottawa et pfesenfees un 
peu partout au Canada au cours des demteres amfees. 

C’est aussi la plus longue s6rie au monde de conferences 
sur l’infographie, et elle continue de’apporter une contri¬ 
bution importante k l’infographie et aux techniques 
interactives. La Socfefe canadienne du dialogue humain- 
machine est heureuse de pouvoir parrainer une telle 
conference et elle est redevable k ceux et celles qui ont 
contribu£ temps et efforts pour en faire un succfcs. 

Confomfement k la tendance €tablie en 1986 (par coinci¬ 
dence aussi k Vancouver), Graphics Interface se tient en 
meme temps que Vision Interface, ce qui contribue k 
promouvoir Ifechange d’ktees et k rehausser le profil des 
deux conferences. Cette ann£e nous sommes aussi avec 
Artificial Intelligence '92, la conference canadienne sur 
1’intelligence artificielle. 

Enfin, if faut souligner la contribution de tous ceux qui ont 
aid 6 k organiser la conference et k pfeparer le programme 
technique et les compte-rendus. Nous sommes trfcs 
reconnaissants de leurs efforts. Nous n’avons malheureu- 
sement pas assez d’espace pour donner une liste complete. 
Une liste partialle apparait dans les comptes-rendus, mais 
il y a de nombreuses personnes, en particulars les 6tu- 
diants b6n£voles, dont les noms manquent, et qui ont jou6 
un rdle significatif pour que cette conference ait lieu et soit 
un succfcs. 

Wayne A. Davis 
President, SCDHM 
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1992 CHCCS Achieve- 
ment Award — Wayne 
Davis, University of 
Alberta 

The recipient of this 
year’s award is Wayne 
Davis. The Society 
wishes to recognize and 
acknowledge his un¬ 
swerving commitment 
and many contributions 
to the Society. We also 
recognize that his many 
activities have done 
much to maintain a 
hospitable climate in 
Canada for research and 
development in computer 
graphics and human- 
computer interaction. 

Wayne was bom in Fort 
Macleod, Alberta in 1931 
and received his elemen¬ 
tary schooling there. He 
received a B.S.E. degree 
in Engineering (Math¬ 
ematics option) at 
George Washington 
University in 1961. 

Shortly afterwards he 
went to Ottawa to 
continue his studies at 
the University of Ottawa, 
receiving his MSc and 
PhD degrees in Electrical Engineering in 1963 and 1967, 
respectively. Wayne continued his studies while he was a 
Research Scientist with the Communications Research 
Centre and its predecessor, the Defence Research 
Telecommunication Establishment (Ottawa). Since 1969 
Wayne has been on the faculty of the University of 
Alberta and has been a full Professor since 1977. Wayne 
took an early retirement from the formal side of the 
University in 1991, but has remained as the first Profes¬ 
sor Emeritus in the Department of Computing Science. 
He was a founding member and later Acting Director of 
the Alberta Centre for Machine Intelligence and Robotics 
(ACMIR). 

Wayne has wide ranging interests that span computer 
graphics, image processing and other aspects of comput¬ 
ing. His specific research interests have ranged from 
geographical information systems to image processing 
and remote sensing. Wayne supervised one of the first 
PhD students in the Department of Computing Science at 
the University of Alberta. Subsequently, Wayne super¬ 
vised many PhD and MSc students, both in image 
processing and in graphics. His research, and that of his 
students, has resulted in about one hundred publications 
in professional journals, conference proceedings and 
technical reports. 


Plaque d’honneur de la 
SCDHM 1992 — 

Wayne Davis, Univer¬ 
sity de P Alberta 

Le laur£at cette ann£e 
pour recevoir la plaque 
d’honneur de la SCDHM 
est Wayne Davis. Nous 
voulons par 1 k reconn aT- 
tre son remarquable 
d£vouement et ses 
nombreuses contribu¬ 
tions k la soci6t£. Nous 
voulons aussi reconnaitre 
que ses nombreuses 
activitds sont largement 
responsables pour avoir 
ytabli et maintenu au 
Canada un climat 
favorable k la recherche 
et au developement en 
infographie et en 
interaction humain- 
machine. 

Wayne est n 6 k Fort 
Macleod, Alberta, en 
1931, et aregu 1 k son 
instruction yiymentaire. 

II a re 9 u un baccalaur£at 
en g£nie (option math£- 
matique) de 1*university 
George Washington en 
1961. Peu aprfcs il s’est 
rendu k Ottawa pour 
poursuivre ses Etudes a l’universitd d’Ottawa, d’oO il a 
re 9 U sa maitrise et son doctorat en 1963 et 1967, respecti- 
vement. Wayne a continue ses dtudes tout en ytant un 
chercheur scientifique au Centre de recherche en commu¬ 
nications, et k son pr&tecesseur, l’fitablissement de la 
defense en recherche en tyiycommunications, k Ottawa. 
Depuis 1969 Wayne est membre de la faculty de l’univer- 
sity de 1’Alberta, et est professeur titulaire depuis 1977. 
Wayne a pris une retraite anticipde en 1991, mais a guardy 
contact avec Puniversity en devenant le premier professeur 
emeritus du dypartement d’informatique. Il a yty un 
membre fondateur, et plus tard directeur pro tempore de 
P Alberta Centre for Machine Intelligence and Robotics 
(ACMIR). 

Le domaine d’intyr§ts de Wayne recouvre 1*infographie, le 
traitement d’images et beaucoup d’autres aspects de 
1’informatique. Spycifiquement ses recherches sont aliys 
des sytfcmes d’information gyographiques au traitement de 
1’image et k la tdiy-dytection. Wayne a dirigy une des 
toutes premieres theses de doctorat en informatique k 
1’university de P Alberta. Il a ensuite dirigy de nombreux 
autres dtudiants de maitris et de doctorat, aussi bien en 
traitement d’image qu’en infographie. Seul ou avec ses 
ytudiants il a publiy environ une centaine de communica¬ 
tions dans des revues, des confdrences ou dans des 
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In the 1980s Wayne was one of the main participants in 
the cooperative program with the Department of Compu¬ 
ter and Information Science of the Harbin Shipbuilding 
Engineering Institute, Harbin, People’s Republic of 
China. He has made several visits to Harbin, lecturing 
there and in other places in the People’s Republic of 
China. Wayne has maintained links with university life 
in China; since 1985 he has been an Honorary Professor 
in the Department of Computer Science and Information 
Science at Harbin Shipbuilding Engineering Institute. 

Wayne has been the guiding light and untiring leader of 
the Canadian Human-Computer Communications 
Society (CHCCS) since it evolved from its humble 
beginnings. Through his parallel activities in the Cana¬ 
dian Image Processing and Pattern Recognition Society 
(CIPPRS) and in the Canadian Society for Computa¬ 
tional Studies of Intelligence (CSCSI) he was responsible 
for bringing the respective conferences under one roof. A 
first tentative step to such conference was a graphics 
stream in the CSCSI Conference, organized by Wayne 
and held in Victoria in 1980 (a CM CCS off-year). After 
much persuasion by Wayne, either or both of the other 
two societies joined with CMCCS, and now CHCCS, to 
hold conferences in succeeding years. 

Wayne’s interest in, and allegiance to, Western Canada 
has always been evident in his efforts to make CHCCS 
national in scope. While the early conferences were held 
in Ottawa with an occasional foray into other parts of 
Eastern Canada, Wayne has been instrumental in bring¬ 
ing the conferences to Victoria, Vancouver, Edmonton 
and Calgary. In each case Wayne used his powers of 
persuasion to ensure that there was a strong conference 
organization. Consequently these conferences were 
highly successful. By making the Society such an 
ongoing success he has contributed inestimably towards 
the maintenance of a climate for exchanging research 
ideas and for providing researchers and especially 
students with a place to publish and to interact with 
researchers at other academic institutions. 

Through this award, members of the Society wish to 
acknowledge Wayne’s many contributions to the overall 
development of computer graphics and human-computer 
interaction research in Canada. 


rapports techniques. 

Dans les anndes 1980, Wayne a 6t6 un des principaux 
participants dans le programme coop^ratif avec le d^parte- 
ment d’informatique de l’lnstitut de genie de construction 
navale d* Harbin, en Republique de Chine. II a visits 
Harbin plusieurs fois, en donnant Ik et en plusieurs autres 
endroits de Chine des s^minaires. Wayne a maintenu des 
liens etroits avec le monde universitaire chinois. Depuis 
1985 il est professeur honoraire du departement d’infor¬ 
matique de l’lnstitut de gdnie de construction navale 
d’Harbin. 

Wayne a ete le phare et le leader infatigable de la Societe 
canadienne du dialogue humain-machine (SCDHM) 
depuis ses modestes debuts. A travers ses activity paralfe- 
les dans 1’Association canadienne de traitement de 
Fimages et de reconaissance des formes (ACTIRF) et de la 
Societe canadienne pour l’dtude de 1’intelligence par 
ordinateur (SCEIO) il a 6t6 responsable pour Fintegration 
de leurs conferences sous un seul toit. La premiere 
experience avec une telle conference a 6t6 une filfere 
infographie lors de la conference de la SCEIO, organisee 
par Wayne, qui eut lieu k Victoria en 1980 (une annee sans 
conference SCDHM). Aprhs avoir ete convaincues par 
Wayne, une ou deux des societes ont joint SCDHM pour 
tenir des conferences dans les annees suivantes. 

L’inter§t et la fideiite de Wayne au Canada de Fouest a 
toujours ete en evidence, y compris dans ses efforts pour 
faire de la SCDHM une societe k Fechelle nationale. Alors 
que les premieres conferences ont eu lieu k Ottawa, avec 
des pointes dans d’autres villes de l’est canadien, Wayne a 
ete le moteur principal dans le mouvement des conferen¬ 
ces k Victoria, Vancouver, Edmonton et Calgary. Dans 
chaque cas Wayne a utilise sa force de persuasion afin de 
garantir une bonne organisation, et en consequence ces 
conferences ont ete un grand succhs. En assurant ainsi le 
succhs de la societe il a contribue immensement k la 
creation et au maintien d’un climat favorable k Fechange 
des idees et k la presence d’un lieu oh les chercheurs et 
surtout les etudiants chercheurs, peuvent publier et 
interagir avec leurs collhgues des autres sites universitai- 
res. 

Par Fattibution de cette plaque, les membres de la societe 
veulent souligner les nombreuses contributions de Wayne 
au developpement general de 1* infographie et de 
Finteraction humain-machine au Canada. 
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An Implementation of Multivariate B-Spline 
Surfaces over Arbitrary Triangulations 

Philip Fong and Hans-Peter Seidel * 


Computer Graphics Laboratory, 
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Abstract 

Recently in [7], a new multivariate B-spline scheme 
based on blending functions and control vertices was 
developed. This surface scheme allows C k ~ l -continuous 
piecewise polynomial surfaces of degree k over arbitrary 
triangulations to be modelled. Actually, piecewise poly¬ 
nomial surfaces over a refined triangulation are pro¬ 
duced given an arbitrary triangulation. The scheme 
exhibits both affine invariance and the convex hull prop¬ 
erty, and the control points can be used to manipulate 
the shape of the surface locally. This paper describes 
a test implementation of the scheme for quadratic and 
cubic surfaces. Issues such as evaluating points on the 
surface, evaluating derivatives on the surface and repre¬ 
senting piecewise polynomial surfaces as linear combi¬ 
nations of B-splines will be discussed. Several examples 
illustrate the implementation. The work is incorporated 
into a surface editor which is currently being developed 
at the University of Waterloo. 

Keywords: Blossoming, B-patch, B-spline Burface, 
blending functions, control points, simplex splines, po¬ 
lar forms. 

1 Introduction 

Tensor-product B-spline surfaces [1, 2, 8, 9, 14, 24] have 
proven themselves an excellent tool for the modelling of 
free form surfaces. However, tensor-product surfaces 
also have their well-known draw-backs if the modelling 
of largely irregular objects is required. Therefore, not 
surprisingly, the need for B-splines over non-rectangular 
regions has been expressed quite early [25]. 

Splines over arbitrary triangulations of the parame¬ 
ter plane have first been considered in [5, 17]. These 
multivariate splines are defined as projections of sim- 
plices and are therefore called simplex splines. The 

*This work has been partly supported by the Natural Sci¬ 
ences and Engineering Research Council of Canada through 
Personal Operating Grant OGP0105573 and Strategic Op¬ 
erating Grant STR0040527 


main drawback of simplex splines in the past has been 
the difficulty to form linear combinations and the ab¬ 
sence of control points. 

A different approach has been taken in [30]. The B- 
patches developed there are based on the study of sym¬ 
metric recursive evaluation algorithms and are defined 
by generalising the de Boor algorithm for the evalu¬ 
ation of a B-spline segment from curves to surfaces. 
B-patches have control points but the construction of 
smooth surfaces still requires considerable computation. 

Other approaches to the construction of B-splines 
over irregular domains have been based on subdivision 
[3, 11], interpolation [21], and on the use of multisided 
patches [18, 19, 26]. However, each of these schemes 
has its own difficulties. 

One really needs a scheme which constructs auto¬ 
matically smooth complex surfaces and which contains 
control vertices for shape manipulation. A new multi¬ 
variate B-spline scheme based on a combination of B- 
patches and simplex splines which meets these criteria 
was developed in [7]. This paper discusses details of an 
implementation of it which is being used in a surface ed¬ 
itor being developed at the University of Waterloo. A 
process of converting piecewise Bezier polynomials to 
this new scheme and vice-versa will be explained. This 
leads to a method for surface refinement. 

The paper is divided up in the following way. Sec¬ 
tion 2 introduces some notation which will be used in 
the remainder of the paper. Section 3 describes the new 
B-spline scheme. Section 4 discusses the implementa¬ 
tion while Section 5 illustrates the new B-spline scheme 
through examples. We finally finish off with some con¬ 
cluding remarks. 

2 Notation and Definitions 

This section introduces some notation which is used in 
the rest of the paper. 

Let W = {wojWijWj} C R 2 be a set of affinely 
independent points and let u E R 2 . If the determinant 
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d(W) is defined as 

d(W) = det f 1 1 1 ^ 

' \ W 0 W! W 2 J 

and the determinant d 7 (u|VP) as d(W) with the point 
w j being replaced by u, then the barycentric coordinates 
of u with respect to the ordered set W are given as 


Note that 


d(W) ' 


j = 0... 2 


( 1 ) 


2 2 

u = Yi ^( u ) w j and S = L 

J=0 j =o 

For a set V = {vo,...»v n } C II 2 , we let [V] denote 
the convex hull of V and we let (K) denote the half¬ 
open convex hull of V. The definition of the half-open 
convex hull is given in [29] and is repeated here. 


Definition 2.1 (Half-Open Convex Hull) Given 
Vo,..., v n E JR 2 , the half-open convex hull is then de¬ 
fined as follows: Let £ be the unit horizontal vector in 
JR 2 . A point u 6 JR 2 belongs to the half-open convex 
hull [vo,..., v n ) if and only if there exists a vector 7 } 
with positive slope and a positive scalar e such that the 
set + ti} | 0 < s, t, 0 < s -J- t < e) is completely 
contained in the interior of [vo,..., v n ). 


3 Bivariate B-Splines 

The new B-Spline scheme is obtained by matching B- 
patches [28, 30] with simplex splines [5, 17]. By match¬ 
ing we mean that the recurrence relation which de¬ 
scribes simplex splines is made to agree with the re¬ 
currence relation for B-patches under some conditions. 
Before we present this, we need some background infor¬ 
mation on simplex splines and B-patches. 


3.1 Simplex Splines 

Definition 3.1 (Simplex Splines) 

Let V = {to,..., t m } be a finite set of points in R 2 and 
let u be a point in R 2 . The simplex spline M(u\V) = 
M (u|to,..., t m ) is defined recursively as follows. For 
V = {to, ti, t 2 } we let 



(2) 

where 


Xrt t t >(*0= I 1 ifu€ [to.ti.M 

[ 0 »t x ,t 2 )V ) | q otherwise 

(3) 

is the characteristic function on [to,ti,ta). 
{to,..., t m }, m > 2, we set 

For V = 


(4) 


where W = {ti 0 , t tl , t, a } is any subset of affinely in¬ 
dependent points in V [20]. The points t 0 ,..., t m are 
referred to as knots. 

It is worth mentioning that the above definition is 
completely independent of the choice of W [20]. 

Equations (2) and (4) differ slightly from the ones 
given in [17, 20] in that we have based the character¬ 
istic function on the half-open convex hull [to,ti,t 2 ) 
(Definition 2.1) instead of the convex hull [to,ti,ta]. 
Otherwise, problems arise when the recurrence relation 
is used for points u which lie along knot lines (lines 
connecting any two knots) [20]. The above definition 
alleviates the problem by modifying the area of sup¬ 
port for the B-spJines. This is analogous to the case for 
univariate B-splines [27] where they are non-zero on the 
half-open interval [t 0 ,ti) instead of the closed interval 
[to.tij. 

The simplex splines M (u| V) then exhibit the follow- 
ing properties: 

• Piecewise polynomial of degree k = m - 2 

• Local support on the closed convex hull [V] 

• Non-negative - M(n\V) > 0 for all u E Ft 2 

• C k ~ l -continuous everywhere 

Further information can be obtained from [5, 6, 15, 16, 
17, 20, 32]. 

Definition 3.1 shows us that plenty of simplex splines 
exist. The question which remains is how to form lin¬ 
ear combinations from them such that piecewise poly¬ 
nomial surfaces over arbitrary triangulations can be 
constructed. This involves choosing the right simplex 
splines and the right normalization. These problems 
may be solved by studying B-patches. 


3.2 B-patches 

B-patches [30] are a patch representation for polyno¬ 
mial surfaces that arises from generalizing the de Boor 
algorithm from curves to surfaces [28, 30]. One defini¬ 
tion of B-patches is by means of their blending functions 

Bfa). 

Definition 3.2 (B-patch Blending Functions) 

Let A(/) = [ti ol t{j, t{ 3 ] E JR 2 , I = (io.ii, * 2 ) 
be given along with the additional set of knots 
ft'0,0.... 1 t; 0f fc_ 1 , t{j f o,..., t jj 1 , t t o,. .. , t,* 2i fc_i in 
R such that tt 0i o — t , 0f t* lt o — t 7l , and t{ 0l o = 
ti 2 . Also, assume that every triple set of knots 
(t* o ,0o 1 ^Ut/9i» ^»3.j0a ), 0 ^ 0o + + 02 < A: — 1 is 
affinely independent, i.e. [t^,^ , tij,^ , ti 3 ,fo] forms a 
proper triangle. Then, for u E JR 2 , the B-patch blend¬ 
ing functions B^( u), \0\ = k, of degree k over A(/) 

are given by the recurrence 

£( 7 o ( o,o)(u) = 1, (5) 
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and 


Hi «) = »*'_.,(„), 1/31 > 0 - ( 6 ) 

j=0 

Terms u/iifi ne^oitve indices are set to zero and 
Xpj(u) = <£j*(u|H^)/c2(PV]g) are the barycentric coordi - 
no<ej o/u uiit/i respect to = {ti 0lA)1 t lllftl ti, lft }. 
Here e° = (1, 0,0), e l = (0,1,0), and e 2 = (0,0,1). 


The B-patcli blending functions form a partition of 
unity [30], i.e. ]C|/ 9 |=j» B p( n ) = Every polynomial 
surface F can be represented as a linear combination of 
them as follows: 

F(u) = £ c^(u), cjS e n s (7) 

I0l = * 


where 




( 8 ) 


/(fto.Oi * * * l ^*0»^0» * 0 } « • • > , ti 3| 0, . . . ti 3} £ 3 ) 


are the B-patch control points which form the B-patch 
control net. Here / represents the blossomoi polar form 
of F [10, 22, 23]. The representation given by (7) is 
called the B-patch. 

The shape of a B-patch is strongly influenced by the 
shape of its control net. We can form larger surfaces 
by piecing together individual B-patches. However, the 
construction of overall smooth surfaces still requires 
quite a bit of computation. What is needed are blending 
functions which produce smooth piecewise polynomial 
surfaces automatically. Simplex splines, which were in¬ 
troduced in Section 3.1, give us just that and by combin¬ 
ing them with B-patches, we are led to the new B-spline 
scheme. 


3.3 The New B-Spline Scheme 

The development of the B-spline scheme in [7] is based 
upon the fact that the recurrence relations (2),(4) and 
(5),(6) agree under the proper renormalization and the 
proper selection of knots. We now briefly describe its 
construction. 

Let T — {A(J) = [t« 0l tti,t{,] | I = (1*0,11,12) € 
I C Z\} define a triangulation of It 2 or some bounded 
domain D C R 2 . Then, for any two /,J 6 T, A(J) n 
A(J) is empty or is a common vertex or edge of A(/) 
and A(J) (see Fig. 1). 

Next, a sequence of knots t{ f o,...»t t| fc is assigned to 
each vertex tj in the triangulation such that t^o = t t * 
and that any set of three knots is affinely independent. 
The sequence of knots ti,o,..., t,-,* iB referred to as the 
cloud of knots associated with the vertex t<. We are 
now in a situation to construct simplex splines of degree 
k over the triangulation T. We consider the following 
simplex splines: 

M(u|V) = Af(u|V^) (9) 



Figure 1: Triangulation of a bounded domain D C 
JR 2 


where I 6 T, |/?| = fc, and 

Vfi = (10) 

{ti'o.Oj • • • > tin,/3o i tt, ,0i • • •, ti, ,0, , t.'a,0, • • • > }. 

We define the regions as follows: 

H ■■= r\<0«], n{ : = int(n w=fc n£). (11) 


We uIbo assume that ^ 0 which can be obtained if 
each of the clouds of knots associated with the three 
vertices of a triangle is kept separate from one another. 
In other words, for each vertex t* in the triangulation T, 
its cloud of knots is contained within a circle C< centred 
at t,* such that C,*nC ; - = 0, for all i ^ j (i.e., none of the 
circles intersect one another). Figure 2 below illustrates 
an example of this Betup. 


t{ 3 ,2 t i 2i i 



Figure 2: The region 


Then, under these conditions, it is shown in [7] that 

B}{ u) = \d{W^)\M{u\V^) } for all u € (12) 

where \f3\ — k t Vj/ is defined in (10) and 
Wp - {ti 0f/3o , ti ll/3l , ti a ^ 3 }. From (12), we let the 
normalized B-splines be defined as 

H («) == \d(W^)\M(n\V^). (13) 

These will be the blending functions used in the new 
B-spline scheme. A B-spline surface F of degree k over 
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a given triangulation T with a knot net fC = {ti f j | i E 
l = 0,..., k} can then be defined bb 

W = X) X c '.^i( u )- ( 14 ) 

/€T |/3|=fc 

The E R 3 are the control points which make up 
the control net for the surface F. 

Since both simplex splines and B-patches are used 
to develop the new Bcheme, their individual properties 
are inherited by the new scheme. It is these proper¬ 
ties which make it possible (relatively easily) to model 
C 1 * -1 -continuous piecewise polynomial surfaces of de¬ 
gree k over arbitrary triangulations. 

Affine Invariance: The relationship between the 

control points and the B-spline surface is invariant 
under affine coordinate transformations. That is, if 
# : R 3 -► R 3 is an afRne map (rotation, translation, 
scaling), then 


*(X X c i . p n p ( u ))=X X 

l€ T|0l=fe J€X|/9|=fc 

(15) 

Convex Hull Property: A B-spline surface lies in 
the convex hull of its control points. 


4.1 Evaluation 

The most important algorithm required is one which 
evaluates points on the surface. That is, given a pa¬ 
rameter value u 6 R 2 in the triangulation T, we want 
the value of the point on the surface corresponding to 
u. We use equation (14) as the basic formula in our 
algorithm. Since the normalized B-splines Np{ u) are 
the most complex terms in the equation, we will only 
concentrate on them. 

In order to evaluate the normalized B-splines Np(u) 
defined in (13), we first need to compute the simplex 
splines M(u\Vp) defined by the recurrence (2), (4). We 
start off by describing the evaluation of linear simplex 
splineB because all higher order splines are composed of 
these. 

Let V/ = {to, ti, t 2 , ta} C R 3 and without loss of 
generality, let the set W = {t 0 ,t l ,t 2 }. Then, after 
expansion of the recurrence and substitution of the base 
case (2), the linear simplex spline becomes 


M(u|K/) 


rfo(u[HQX|t 1 ,t„t,)(“) , 

d(W) |<t(ti,t 2 ,t s )| + 

rfi(n|W r )* 1 t D ,t„t,)(“) 

d(W) + 

< fa(uHr)x [ t 0 ,t > ,t l) (») 

d(W) \d(t 0 ,tuU)\- {lb) 


Local Support: Movement of the control point 
only influences the region of the surface on A(7) and 
those surrounding it. 

Continuity A degree k B-spline surface is a piecewise 
polynomial of degree k over the sub-triangulation in¬ 
duced by its knot net that is C^” 1 -continuouB every¬ 
where if its knots are in general position. But, from the 
theory of simplex splines, knot multiplicities along a line 
reduce the order of continuity along this line [20]. For 
example, a degree 2 surface with knots in general posi¬ 
tion is C l -continuous everywhere. Placing three knots 
on a line reduces the continuity to C° and placing four 
knots produces a discontinuity along the line. Thus, 
the underlying knot net provides additional degrees of 
freedom to control the shape of the surface. Figure 3 
shows the quadratic normalized B-splines over different 
knot configurations. 


4 Implementation 

The theory presented in the previous sections is used 
in a surface editor which is being developed at the Uni¬ 
versity of Waterloo. A surface editor allows one to ma¬ 
nipulate the shape of a. surface through the movement 
of the control vertices which make up the control net. 
The new B-spline scheme also allows surface changes 
to be made through movement of knots. This section 
describes some of the algorithms used in the editor. 


One can evaluate (16) by blindly computing and plug¬ 
ging in values for each of the terms. However, depend¬ 
ing on the value of the characteristic function x (3), 
some of the terms may be zero. This can lead to a 
very inefficient evaluation technique. A better method 
is to compute only those terms for which the character¬ 
istic function is non-zero. This is, in itself, governed 
by the choice of W . To do this, we need to know 
where in the knot configuration the point u lies. This 
involves looking at the various knot configurations for 
a linear spline (see Fig. 4). We first point out that 



to to ti to tg ti to = ta ti 


Figure 4: The four essentially different knot config¬ 
urations for a linear B-spline M(u|t 0 , t Xl t 2 , t 3 ). 


each of the configurations is composed of the four trian¬ 
gles (although some may be degenerate); Afto.tijtj], 
A[to,ti,ta], A[to,t 2 ,ta], A[ti,t 2 ,ta]. Each of these 
triangles appears in at most one of the characteristic 
functions in (16) or in the set W. For every single one 
of the configurations above, if a point u is inside its 
half-open convex hull, then u belongs to exactly two of 
its triangles. For instance, in the leftmost configuration 
above, if u E [t 0 , t x1 1 2 ), then either u E [to, t x , t 3 ) or 
u € [ti, t 2 , ts) but not both. If u lies on the boundary 


Graphics Interface ’92 



5 



Figure 3 : The quadratic normalized B-splines N[ 10 (u) and WjooC 11 ) over the six different knot configura¬ 
tions. ® is a double knot and @ is a triple knot. 
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between two triangles or on the vertex of two or more 
triangles, then Definition 2.1 for the half-open convex 
hull ensures that u only belongs to one of them. It is 
this, which allow points on knot lines to be evaluated 
correctly [16, 20]. 

Using these facts then, we let W be one of the trian¬ 
gles in which u belongs. In light of the above discussion, 
this will force two of the three characteristic functions 
to be zero, and hence leaving only one term in (16) to 
be evaluated. 

The only real work involved then is to figure out 
which triangles the point u belong. One way iB to calcu¬ 
late the barycentric coordinates of u with respect to the 
vertices of the triangle. If the coordinates are all greater 
or equal to zero, then u is inside, otherwise it’s outside. 
Of course, a slight modification has to be made since 
we are dealing with the half-open convex hull instead 
of the closed convex hull. 

In the implementation of the surface editor, the com¬ 
putation is organized such that intermediate results ob¬ 
tained from determining the regions containing u are 
later re-used in the evaluation of the linear B-spline. In 
this way, only a small number of determinants need to 
be explicitly computed while others can be derived from 
some linear combination of these. 

Higher order simplex splines (m > 4) are simply com¬ 
puted using the recurrence relations (2),(4). We do not 
try to optimize the computation like we did in the lin¬ 
ear case above because it is not worthwhile due to the 
increase in the number of knot configurations. At each 
level of the recurrence, any choice is suitable for the set 
W C Vi as long as it forms a proper triangle. However, 
a good choice is to pick W such that u E [W] which 
gives positive barycentric coordinates. This eliminates 
any negative terms and hence, increases the numerical 
stability of the evaluation [16]. 

Having computed the value for the simplex spline, we 
can get the value for the normalized B-spline from (13) 
and finally evaluate the point on the surface from (14). 

4.2 Derivatives 

A directional derivative along a given direction v E R 2 
for a parameter value u E R 2 may be computed in the 
same manner as in its evaluation. The only difference 
lies in the fact that the barycentric coordinates of a 
vector v add up to zero instead of one, i.e. 

2 2 

^(ij(v) = 0 andv=^ W (v)t i . (17) 

j = 0 j=0 

The directional derivatives for degree k simplex splines 
is then given as 

2 

P v M(u|V) = k m(v)M{u\V\{n 3 }) (18) 

j=o 

with V as defined in Definition 3.1. Then the directional 
derivative along the direction v at a parameter value u 


for a surface F is given by 

PvF(u) = ]T £ c'PvWj(u) (19) 

J€T !0|=fc 

with 

VvNii u) = |<£(W/)|PvJtf(u|lO. (20) 

We can then use (19) to calculate the tangent normal 
for a point on the surface in the following way. Any 
two directional derivatives v it v 2 E R 2 (vi / av 2 ,a E 
R) are computed at the point and the resulting cross- 
product V Vl F(u) x V\^F{ u) will yield the desired 
normal vector. The surface editor makes use of these 
directional derivatives in its surface shading (Gouraud) 
routines. 

4.3 Piecewise Polynomial Surfaces 

For a surface scheme to be as flexible as possible, it 
must be able to represent as many surfaces as possible. 
This section shows that any piecewise polynomial sur¬ 
face F over a triangulation T can be represented as a 
linear combination of normalized B-splines Np(\ i). It 
also shows that B-splineB can be represented as piece- 
wise Bezier surfaces. 

The precise statement for the representation of piece- 
wise polynomials as linear combinations of B-splines is 
as follows [31] 

Theorem 4.1 Let F be any piecewise polynomial sur¬ 
face of degree k over a given triangulation T that is 
C k ~ 1 -continuous everywhere and let fi be the polar 
form of the restriction of F to the triangle A (/), / E X- 
Then 

= X! c ''* 3; ^( u ) ( 2i ) 

/€T|j0|=fc 

wit/i 

Cj,i — /j(tt 0 ,o> • • • j t't'o.lo —i > • • •», tt 3 t i 3 —i). 

( 22 ) 

If we let F = 1, then its polar form / = 1 and from 
Theorem 4.1, we get ^i( u ) = 1 which shows that 

the normalized B-splines N^(u) form a global partition 
of unity. 

Piecewise Bezier surfaces F of degree k over irregu¬ 
lar triangulations T can be converted to B-spline sur¬ 
faces of the same degree by using Theorem 4.1. Briefly, 
the algorithm is as follows. For each vertex t; of a 
given triangulation T of a bounded region D C R 2 , 
knots tt,o»• * • ,ti t * are assigned (in general position) to 
it such that U t o = t{. The assignment must follow the 
conditions in Section 3.3 and in addition, vertices on 
the boundary of D must have their knots outside of D. 
Then, polar forms /; of the restriction of F to every 
A(7) E T, I E J, are computed using the multiaffine 
version of the de Casteljau algorithm [10, 23]. The B- 
spline control points are then obtained by evaluating 
fi using (22). 
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Converting degree k B-spline surfaces F over arbi¬ 
trary triangulations T to piecewise Bezier surfaces F 
requires a bit more work. We need to come up with two 
things: a triangulation and the Bezier control points. 
We cannot use T as our triangulation, as in previous 
case, because we now need a finer triangulation due to 
the additional lines introduced by the knots. A finer tri¬ 
angulation can be derived from the knot net associated 
with each A(/) G T, / 6 I (see Fig. 2). Let’s consider 
only the A[ti 0 l o, t» tl o, ,o] in Figure 2 and itB interior 
including all line segments passing through the interior 
We have, in effect, divided up the triangle into regions 
using the knot lines. Note that not all of the regions 
in A[t, 0|0 , t« lf o, t,* a ,o] are triangular; so, we must further 
divide (arbitrarily) these regions up. The resulting con¬ 
struction yields the refined triangulation Tj restricted 
to A(/). Then, the refined triangulation for the piece- 
wise Bezier surface F is T = U/Tj, A(i) G 2 1 , / € T. 

Having constructed the triangulation, we next deal 
with the Bezier control points. For each domain trian¬ 
gle A(/) € T, we associate with it, a Bezier triangle 
with control vertices b^. From the theory of Bezier 
triangles [12], a Bezier surface interpolates the corner 
vertices (those which lie at the corners of the domain 
triangles) of its control net. Thus, these corner control 
vertices will precisely lie on the surface F and can be 
computed by evaluating F(t*) for all vertices ti in the 
triangulation T. The other control vertices are given by 

b/g = f(U 0 ,o» ■ • •»t < 0 ,o> • ♦«> ^* 3 , 0 > • ■ • j ^t' 3 , 0 ) (23) 

V -V-' V -V-' 

0o times /9 3 times 

where / is the multiaffine definition of F. 

4.4 Refinement 

For practical purposes, surface schemes must also allow 
for refinement or subdivision [4, 13]. The idea is that 
fine detail may be required for parts of the surface but 
the existing control points do not allow for the mod¬ 
elling of such detail. Thus, we need to be able to add 
extra control points only to those regions, 

Then, for the new B-spline scheme, we need to have 
a finer triangulation over the areas that need to be re¬ 
fined. A finer triangulation will, in effect, provide us 
with more control vertices. We use a combination of 
the conversion algorithms from Section 4.3 to solve the 
problem. 

Suppose we are given a B-spline surface F over an 
arbitrary triangulation T and we want to refine the sur¬ 
face region Fj that’s restricted to A(/) £ T. First, Fj 
is converted into a piecewise Bezier surface Fj using 
the latter algorithm in Section 4.3, Then, the result¬ 
ing Bezier surface Fj can be converted back into B- 
spline representation using Theorem 4.1. Hence, a finer 
triangulation with control vertices over A(/) has been 
produced. This technique also lends itself to recursive 
refinement (i.e. refined areas may be further refined, 
etc). 


5 Examples 

This section illustrates examples of quadratic surfaces 
produced from our test implementation. The creation 
of the triangulations and the positioning of the knots 
for the surfaces were all done manually — no automatic 
procedure was involved. However, the B-spline editor 
was used to position the control vertices. The examples 
show advantages and applications of the new B-spline 
scheme. 

Figure 5 shows the advantage of converting a C l - 
continuous piecewise polynomial quadratic Bezier sur¬ 
face into a quadratic B-spline surface. The movement 
of a Bezier control point will generally destroy the con¬ 
tinuity of the surface (Fig. 5(b)), but the movement 
of a B-spline control point will preserve the smoothness 
and C 1 -continuity throughout the entire surface (Fig. 
5(d)). Thus, when designing an object, one does not 
need to worry about preserving its smoothness, but, 
can concentrate solely on designing its shape. 

Figure 6 shows two examples of an application to the 
polygonal hole problem. This problem involves a degree 
k piecewise polynomial surface containing an interior 
hole. We wish to patch up the hole such that the over¬ 
all smoothness or continuity of the surface is preserved 
(especially around the boundary of the hole). The idea 
of the solution is the following. We first represent the 
piecewise polynomial surface around the hole as a lin¬ 
ear combination of B-splines (Theorem 4.1). Then, this 
B-spline surface is extended into the hole to produce an 
overall C^^-continuous fill of the hole. 

6 Conclusion 

The new B-spline scheme offers a method of modelling 
complex and irregular objects over arbitrary tri angula¬ 
tions. Smoothness, locality and the modelling of discon¬ 
tinuities are inherited from simplex splines while con¬ 
trol points, affine invariance, and the representation of 
piecewise polynomials are obtained from B-patcheB. 

The implementation that is presented in this paper 
has succeeded in demonstrating the practical feasibil¬ 
ity of the fundamental algorithms underlying the new 
surface scheme. Quadratic and cubic surfaces over ar¬ 
bitrary triangulations can be edited and rendered in 
real-time. Applications like the filling of polygonal holes 
demonstrate the potential of the new scheme when deal¬ 
ing with concrete design problems. Further improve¬ 
ments to our editor that simplify user input and addi¬ 
tional applications are currently under way. 
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Bezier control point is moved, the C 1 -continuity of the surface is destroyed, and a sharp edge is introduced 
(b). If a B-spline control point is moved, the C 1 -continuity of the surface is preserved (d). 


Graphics Interface ’92 








Figure 6: Solving the polygonal hole problem using triangular B-splines: (a) a C 1 -continuous quadratic 
surface containing a 3-sided interior hole, (b) the hole in (a) is filled and the resulting surface is still overall 
C^-continuous. (c) and (d) same problem but with a quadratic surface containing a 5-sided hole. 
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Abstract 

The extension from interactive 2D wireframe 
geometry to interactive solid geometry has been 
for some time one of the goals of Computer 
Graphics . Our approach to this objective is the 
utilization of a computational representation of 
geometric sets that we believe is better suited to 
geometric computation than alternatives inherited 
from mathematics. This representation is the 

Employing such a 
representation leads to simpler and faster 
algorithms and has enabled us to construct an 
elementary interactive solid geometry system 
which can execute effectively on workstations with 
no graphics acceleration hardware. Interactive 
manipulation of a collection of polyhedral objects is 
provided utilizing set operations, affine 
transformations, collision detection, picking and 
dragging, calculation of metric properties, 
rendering of transparent objects, and solid clipping 
to an arbitrary polygonal view volume. This is the 
first such system based on partitioning trees, and 
as a consequence, is the first interactive system 
that fully supports set operations, collision 
detection and transparency for arbitrary 
polyhedral objects. 

Introduction 

Real-time manipulation of 3D geometry has since 
its inception been one the objectives of computer 
graphics. But only limited success has been made 
in achieving this goal in the almost three decades 
that have passed since the creation by Ivan 
Sutherland of the 2D interactive design program 
Sketchpad. We believe one of the crucial factors for 
this is the representation. Given any semantic 
domain, such as geometry, for which we wish to 
construct a computational object, the selection of 
the representation largely determines the 
algorithms required for implementing the 


operators of the domain. A good representation 
should yield simple and efficient algorithms, but 
this will happen only if the representation 
somehow captures the computational aspects of 
the semantic domain. While it is possible to invent 
new data structures optimized for each new 
problem, this approach, if used in the creation of 
an integrated software system, is quite impractical. 
It is also impractical if one is interested in 
hardware acceleration and/or parallelization. 
Rather support of only a few important 
representations is plausible. Such is the case for 
numbers, where we have only a few integer and 
floating point representations. 

Boundary representations (b-reps), essen¬ 
tially inherited from mathematics, represent 
polyhedra in terms of topology: i.e. connected 
components and incidence. No where is the notion 
of a process, i.e. computation, overtly manifested, 
and the view is one of an empty space populated 
by objects. In contrast, the binary space 
partitioning tree , also bsp tree or partitioning tree , 
induces a structure on space that reflects the 
information density of the geometry in that space. 
It does this via recursive subdivision of space by 
hyperplanes which results in a hierarchy of 
convex regions. The combinatorial representation 
of this is a binary tree, which can also be 
interpreted as a computation graph or decision 
tree. Thus computation is intrinsic to the 
representation. Partitioning trees provide an 
efficient method of answering spatial relationship 
questions such as those needed for set operations 
and visibility determination; in contrast, topological 
representations have, by abstraction, removed 
this information from the representation since 
homeomorphisms do not change its structure. 

Partitioning trees originated as a method of 
pre-processing a polygonal database to facilitate 
hidden surface removal (see [Schumacker et al 69] 
and [Fuchs, Kedem and Naylor 80]). Their first use 
for interactive viewing of solids was described in 
[Fuchs, Abrams and Grant 83]. In [Thibault and 
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Naylor 87] partitioning trees were extended to 
provide representations of polytopes, and 
algorithms were presented for converting a CSG 
expression on b-reps into a partitioning tree and 
for performing a set operation between a 
partitioning tree and a b-rep by modifying the 
partitioning tree. This later capability was the basis 
for an interactive solid modeling program in which 
the user sculpted a work-piece with a tool; the 
work-piece was a partitioning tree and the tool was 
a convex b-rep [Naylor 90]. Solid near-plane 
clipping was also incorporated. In [Chin and Feiner 
89], a partitioning tree based shadow algorithm was 
presented, and its effectiveness was demonstrated 
through an interactive program. 

The system described in this paper is a 
successor to [Naylor 90] and represents an entirely 
new software effort based on work presented in 
[Naylor, Amanatides and Thibault 90] which 
provided set operations between arbitrary 
polytopes via merging of partitioning trees. This 
new system allows any number of non-convex 
polyhedra each represented by a partitioning tree. 
Set operations can be performed between any pair; 
thus the restriction in [Naylor 90] to a single non- 
convex work-piece operated on by pre-defined 
convex tools is removed. In addition, arbitrary 
affine transformations can be applied to any object 
and metric properties, such as volume and center 
of mass, can be calculated. 

The system described in this paper is the first 
interactive system using only partitioning trees for 
representing polyhedra. Two significant benefits, in 
addition to set operations, accrue from this: 
inexpensive collision detection and non-refractive 
transparency. We believe our system is the first 
interactive solid geometry system with such 
capabilities. 

User View 

The user is presented with a world composed of a 
set of polyhedral objects, with access to this world 
through a user-interface built upon a 3-button 
mouse and pop-up menus. Interactive control of 
objects and views for models comprised of several 
thousand faces is possible, even on workstations 
with no graphics drawing hardware. 

The geometry of any object can be modified by 
applying affine transformations or by performing 
set operations between it and other objects. Visual 
attributes, such as color, opacity and texture, can 
be selected as well. Measurement of any object is 
also available: this includes computing its volume, 
mass, surface area, center of mass, moments of 
inertia, and axis-aligned extents. The object editing 
operations include choosing a new object from a 
pre-defined set, copying an object, deleting an 
object, reading an object from a file, and writing an 


object to a file. Finally, its combinatorial 
representation can be displayed (a binary tree). 

There is at any one time a single "current" 
object that is the operand of all unary operations on 
objects, and we refer to this current object as the 
tool-object. The user may at any time select any of 
the objects as the tool-object via picking (with the 
mouse). The tool-object is also one of the two 
operands to any set operation, the other operand 
being any object in which it is in contact. The tool- 
object is used as a tool to modify these "in-contact" 
objects, but the tool-object itself is unaffected. A 
special sweep mode is provided in which the tool- 
object is subtracted from each in-contact object as 
the user moves the tool-object. 

Set operations 

All polyhedra are represented by partitioning 
trees, and set operations are performed by merging 
trees [Naylor, Amanatides and Thibault 90]. Figure 
1 gives a simple example suggesting how this could 
occur. Hyperplane normals point to the positive 
halfspace which is associated with the right-child of 
an internal node and the negative halfspace is 
associated with the left-child. At the leaf-nodes, a 1 
indicates an in-cell and a 0 an out-cell. Set 
operations are achieved by merging two trees. This 
can be thought of in terms of inserting one tree into 
the other via a recursive algorithm. At each 
internal node v, the inserted tree T is split by the 
hyperplane associated with v, and the positive and 
negative "halves" of T are inserted into the 
respective subtrees of v. This continues recursively 
until a cell is reached. If the operation is, for 
example, union, then if the cell is an in-cell, the 
inserted tree is discarded; otherwise, being an out- 
cell, the cell is replaced with the subset of T that has 
reached/lies-in this cell. 

Besides the traditional set operations of union, 
intersection and difference, we have included a 
symmetric difference corresponding to the boolean 
not-equal or exclusive-or. The result of this 
operation when viewed from the outside looks like 
union, but where the two objects intersect, a cavity 
is created which may be seen after subsequent set 
operations, or temporary cut-aways, or whenever 
the result is semi-transparent. 

As suggested in Figure 1, each in-cell of a 
partitioning tree has an associated set of attributes 
defining the value of space within that cell (the 
polka-dots and cross-hatches). Since we are 
modeling physical objects, we use attributes 
describing those physical properties of the material 
being modeled that are relevant to our geometric 
operations. Thus color, opacity, density, etc., are 
specified for each in-cell, but not for out-cells. 
Polyhedra are then viewed as functions from 3- 
space to an attribute space in which the domain of 
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A union-operation using BSP Trees 
Figure 1 


the function is confined to the polyhedra. That is, 
for polyhedron P, point X, and attributes A, we 
have the function f : X e P -» A, and for X € P, f is 
undefined (or maps to the NULL value). 

When performing union or intersection, two 
possibly different attributes are defined wherever 
the two objects intersect, but the result must 
specify only one set of attributes in any region of 
space (i.e. we want a function). We currently have 
adopted the policy of attribute precedence: the 
resulting attribute is simply that of the operand 
with greater precedence, and we always give 
precedence to the first operand (it seems oddly 
convenient that the two operations for which this is 
an issue are commutative). An alternative policy 
would be to construct a new attribute from the 
blending of the two operand attributes. 

We have also provided support for 
transparency ; and by this we mean the thin-film 
variety, as with colored cellophane, that can be 
treated as an absorbing but not refracting medium. 
(This is also to be distinguished from "screen-door” 
transparency provided by some z-buffer based 
systems.) The requisite blending calculations are 
performed during polygon scan-conversion using 
an ’’alpha-channel” to hold the opacity of a point on 
those workstations having such capability. Since the 
in-cells of a partitioning tree may differ in their 
material attributes, and since reflection and 
refraction are considered to occur wherever the 
electromagnetic properties of space change 
discontinuously, visible boundaries can be present 
in the interior of a polyhedron, unlike opaque 
polyhedra. What is more, such a boundary can be 
visible from both sides, although with differing 
filtering properties, if the intervening space is 
sufficiently transparent. (The boundary between a 
red region and a green region will appear, if looking 


from the green region as red, but will appear as 
green if looking from the red region.) 

The consequence of this is the need for explicitly 
creating interior faces with the appropriate 
attributes for both sides of each face; and this we 
do, although only for rendering environments that 
can support transparency. While the creation of 
interior faces would seem to be unnecessary 
whenever both sides of a boundary are fully 
opaque impling the boundary is never visible, any 
subsequent reduction in the opacity will reveal the 
interior boundaries. Thus we do not make this 
optimization. 

Another geometric operation of considerable 
utility that we have incorporated is collision 
detection. While the temptation is to think of 
collision detection as merely intersection, we take 
the view that unlike intersection the returned 
value is essentially symbolic rather than geometric. 
Consider two "macro objects", molecules for 
instance, that are created from the union of many 
individual components, such as atoms and bonds. 
When we answer the collision question, we generate 
a list of all pairwise collisions between the 
components from the two macro objects. That is, if 
the component names (implemented, for example, 
as pointers to their representation) are considered 
to serve as the labels for nodes of a collision graph , 
then what we produce are the arcs of this graph. 
(The arcs are directed, going from a component in 
the first operand to a component in the second; 
thus the name space of the two operands need not 
be disjoint.) 

The advantage of this symbolic approach is that 
it separates the mechanism of collision detection 
from the policy of what action to take as a 
consequence of any collisions. Below in the section 
Interactive Techniques, we describe two different 
policies for two different circumstances, neither of 
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which is concerned with the geometry of the 
intersection. Such a separation is made affordable 
because we perform collision detection as a small 
additional overhead to any set operation, and as we 
discuss below in the section Modeling, we use union 
to create the model each time the tool-object 
moves. The cost for collision detection is small 
because the algorithm for any set operation 
recurses until at least one operand is reduced to a 
cell. If this is an in-cell, then the additional work is 
no more than searching the tree of the other 
operand for other in-cells (containing identifiers) 
and the appropriate collision arcs are generated. We 
gain some efficiency by maintaining an identifier at 
any internal node whose region contains only one 
identifier (a list of identifiers at internal nodes is 
another possibility). The arcs are maintained as a 
set; that is, there are no duplicates. 

Affine Transformations 

Affine transformations are an elementary 
geometric operation whose implementation has 
customarily been treated as synonymous with 4x4 
matrices. However, notable benefits can be gleaned 
if they are treated in a more sophisticated way. An 
important motivation for this is the need to provide 
an interactive user with intuitive ways to modify 
an object using affine transformations. 
Traditionally, we think of transformations being 
applied in the order in which they are specified; this 
is typically achieved by maintaining a composite 
matrix so that a new transform modifies the 
composite. While this policy is indeed the right one 
in many situations, it is not ideal for interactive 
object modification. 

Instead, we have found from subjective 
experience that the primitive transformations 
should be applied in the following order: scaling, 
shearing, rotation and translation. Note that the 
transforms which modify distance, i.e. scaling and 
shearing, appear before the rigid body transforms. 
Consequently, we maintain each of the four 
primitive transformations separately and multiply 
them together each time their composite value is 
demanded, replacing the previous composite 
matrix. So for instance, a rotation operation 
modifies only the rotation matrix rather than the 
composite matrix. An additional consideration is the 
fact that all linear transforms have fixed points: the 
origin plus possibly some subset of coordinate axes. 
Changing these fixed points is achieved by 
employing an additional change-of-basis matrix. 
Thus the composite matrix resulting from the 
product of the primitive transforms is pre¬ 
multiplied by the change-of-basis matrix and then 
post-multiplied by its inverse. One then has the 
freedom to choose any local coordinate system for 
an object. 


A second advantage of implementing an affine 
transformation class as something different from a 
matrix is the ease with which optimizations can be 
performed transparently. The greatest disparity 
among the primitive transforms is the much 
greater speed at which translation can be effected. 
Since translation appears in an interactive 
environment with considerably greater frequency 
(moving objects about), detecting when an affine 
transformation is no more than a translation is an 
effective and simple optimization. Similar 
considerations also lead to determining when 
transformed normals need re-normalization. Rigid 
body transforms do not require this and symmetric 
scaling necessitates only divisions, not the 
calculation of the length of the new normal which 
entails computing a square root (profiling 
confirmed that indiscriminate re-normalization 
resulted in significant overhead). A final 
optimization is the generation of composite and 
inverse matrices only on demand rather than after 
every change to the affine transformation. The 
routines that modify affine transformations mark 
them as not up-to-date; and routines that use 
them for transforming points, normals and 
hyperplanes first requests an up-to-date composite 
and inverse matrices, which are generated only the 
first time they are demanded (if the affine 
transformation is not utilizing the pre-defined 
ordering of primitive transforms, the composite will 
already be up-to-date, so only the inverse needs to 
be computed). 

Picking 

A classic operation dating to the days of random- 
scan vector-refresh displays is that of picking an 
on-screen object using the current cursor position 
(originally a light pen). We achieve this by ray- 
casting. In screen-space the cursor position 
provides the initial x and y values of the ray origin, 
and its z-value is the screen-space position of the 
near clipping-plane (0 in our system). The ray 
direction is that of a screen-space projector whose 
length is the distance between the near and far 
clipping-planes in screen-space ( [0 0 1] in our 
system). The ray, with t m i n = 0 and t ma * = 1, is then 
mapped to model-space by the inverse of the 
model-to-screen space transformation. We can now 
use the ray-tracing algorithm for partitioning trees 
[Naylor and Thibault 86] that finds the first 
intersection point within t m i n and t ma x. This 
returns the model-space intersection point, the 
surface normal, the near and far attributes, the 
classification of the ray segment from the origin to 
the intersection point (in or out), and most 
importantly the identifier. Generating an 
equivalent "pick report" using standard methods 
usually entails clipping the entire model to a tiny 
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window. Our method reflects directly the nature of 
the query and is more efficient due to the 
partitioning tree’s search structure. 

Modeling 

The representation of each user-level object entails 
three geometric sets, each represented by a 
partitioning tree. In particular, user-level objects 
are represented by an object-instance class 
containing the object definition, its value prior to 
the last set operation, an instance of the definition, 
and an affine transformation which determines the 
instance in terms of the definition (along with some 
minor state information). 

As we mentioned in the user-view section, the 
model is a set of user-level objects one of which is 
distinguished as being the tool-object; the 
remaining objects we refer to as the static-objects. 
Any time a new tool-object is selected, a union of 
the static-objects is created as a distinct geometric 
set (represented by a single partitioning tree). A 
copy of each object's instance is used for this. The 
model is then the union of two sets: a copy of the 
static-objects and a copy of the tool-object. Any 
time the tool-object is modified, say by moving it, 
the previous model is discarded and a new one 
constructed; but note that the static-objects 
representation remains the same and so is reused 
as long as this set does not change (by set 
operations or picking a new tool-object). 

Since we are using this in an interactive 
environment, these operations must be relatively 
quick. Two techniques are used to accelerate their 
execution. The first is the construction of good 
partitioning trees (Naylor 92]. This subject is 


somewhat complex, and so we will not address it 
here. But the essence is that we use expected case 
models to build partitioning trees that represent an 
object something like a sequence of approximations. 
An artifact of this process is the creation of a first 
level approximation that corresponds to the classic 
technique of bounding volumes. Figure 2 illustrates 
the effect of this when forming the union of two 
disjoint sets; in this case the computation is 
equivalent to forming the union of the bounding 
volumes, and so is very fast. 

The second technique is a classic one: reference 
counts. Copying a tree has the effect initially of only 
incrementing the reference count to the root of the 
tree. Actual duplication of a tree node occurs only 
when it or its subtrees are modified. During set 
operations, any subtree of one tree lying within a 
cell of the other tree does not need to be duplicated 
in order for it to be included in the result. 

Returning to figure 2, if the two trees were 
intended to be copies, only the first three nodes of 
each tree will be duplicated, while the subtrees 
indicated by filled triangles would remain 
untouched. Thus copying and discarding can 
require only sub-linear time. Reference counts are 
used as well for copying of polygons and attributes. 
One consequence of this for attributes is that the 
same allocated attributes will be shared by the 
object definition, its instance, and the copy of the 
instance used to form the model. Thus modifying 
the attributes of the object definition also achieves 
this modification for its instance and the copies of 
the instance present in the static-objects tree and 
in the model tree without the need for any 
additional work (other than rendering the new 
image). 





Union of disjoint objects 
Figure 2 
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Rendering 

Given a partitioning tree representation of a 
collection of sets, for any viewing position a total 
visibility priority ordering of the sets can be 
generated by a single view-dependent traversal of 
the tree [Fuchs, Kedem and Naylor 81]. Since we 
have constructed a single partitioning tree 
representation of the entire model, we can solve 
the visible surface problem by performing the 
ordering operation on this tree. 

There are several advantages to using the 
partitioning tree for visibility instead of the 
commonly used depth-buffer algorithm. First, the 
ordering is generated in model-space as opposed to 
the discrete, post-perspective screen-space. Thus, 
no depth-buffer is needed to represent this 
discrete space and surface continuity (coherence) is 
exploited to avoid performing ordering calculations 
for each pixel. This has contributed to the ability of 
our system to provide interactive solid geometry 
on workstations with no graphics hardware per se 
and requires only the presence of 2D convex 
polygon drawing routines. In addition, the 
information loss due to the perspective division 
using floating-point is avoided. Thus, for sets which 
intersect, or nearly intersect, their ordering will 
not vary incorrectly in the neighborhood of their 
intersection when small changes in the viewing 
position are made (this is especially a problem with 
polygons that are close and almost parallel). One 
consequence of this is that the edges of a polygon 
can be drawn "on top of the polygon without any 
subsets of their discrete representation being 
occluded. 

A second class of advantages of a visibility 
ordering arises from the desire to render non- 
refractive transparent objects and to perform anti¬ 
aliasing. Both of these operations can be 
implemented at the pixel level by blending the 
colors of two pixels. Given only two polygons (or 
pixels) pi and p2, in which pi has color ci and 
opacity ai and occludes p2 which has color C2 and 
opacity a2, then the resulting color is ci,2 = (ci* 
ai) + (C2* OL2) * (1- ai) and opacity is ai,2 = ai + 
(1-ai) * a2- Anti-aliasing using a box filter can be 
treated analogously if the polygon is modeled as 
covering the entire pixel but with opacity equal to 
the area of the pixel covered by that polygon (this 
is in lieu of computing the visible surface at the 
sub-pixel level using masks). The correct use of this 
technique requires that blending occur only 
between pixels that are consecutive in the priority 
order, which cannot be accomplished with the 
standard depth-buffer algorithm. However, a 
multi-pass two-buffer algorithm is known 
[Mammen 89] where the number of passes equals 1 
+ the maximum number of overlapping trans¬ 
parent polygons. While transparency can be 
performed with either a far-to-near or near-to-far 
priority ordering of the partitioning tree, anti¬ 
aliasing requires the near-to-far ordering in order 


to avoid blending in occluded surfaces. Such an 
ordering is also needed for maintaining sub-pixels 
masks, if available, that are required for higher 
fidelity anti-aliasing, and for avoiding the 
calculations at fully occluded pixels (when ai = 1) 
required by texture mapping or Phong shading. 

Another major component of the rendering 
operation that can be achieved simply and 
efficiently using partitioning trees is cli pping . View- 
volume clipping is nothing more than forming the 
intersection of the view-volume with the model. 
Thus, we generate a partitioning tree 
representation of the view volume and execute the 
standard partitioning tree intersection algorithm, 
but with the following optimization. Since the 
results from clipping will not, in an interactive 
environment, be used in any subsequent set 
operations, there is no need to produce the 3D 
geometry of the intersection explicitly. Only the 
intersection of the model's faces with the view 
volume is required. Consequently, the operation 
does not modify the tree representing the model 
and so avoids all copying accept for those faces that 
intersect the boundary of the view volume. 

There are three advantages to partitioning tree 
clipping. First, the efficiency of set operations is 
enhanced by the expected case performance of the 
partitioning tree when thought of as a search 
structure. Roughly speaking, one might expect for n 
faces O(log n) operations instead of the O(n) of the 
standard implementation of clipping. For example, 
any object that is entirely inside or entirely outside 
the view-volume will be clipped by "visiting" only 
the top level nodes forming its bounding volume. 
Thus, culling is achieved without any explicit notion 
of culling just as there is no explicit notion of 
bounding volumes (and so no code to implement 

them); it arises from building good trees for efficient 
set operations. This contributes to the effectiveness 
of the system when used on standard workstations. 
Secondly, near-plane clipping does not reveal a 
polyhedron as an empty shell but rather maintains 
the semantics of solids. Not only are the faces of the 
view-volume lying within objects displayed, but 
they have the attributes of the region which they 

bound. (Note that b-reps do not typically have cells 

with attributes required by this.) This permits 
near-plane clipping to be used to provide cut¬ 

aways as a visualization aid. Thirdly, the 
partitioning tree clipping algorithm supports 
arbitrary polyhedra l view volumes just as easily as 
the traditional truncated pyramid. This can be put 
to good use in an environment with multiple 
overlapping viewports with simultaneous 
renderings in each. Portions of a model occluded by 
another viewport can be clipped away by using the 
appropriate view-volume. 

Interactive Techniques 

Given the above capabilities, we have chosen sev¬ 
eral interactive techniques that enhance the usa- 
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bility of the system. One is the use of collision detec¬ 
tion to provide hi ghlighting via transparency. 
Forming the union of the tool-object with the 
static-objects, like any set operation, can generate a 
collision report. We can exploit this so that when¬ 
ever the tool-object makes contact with any object 
in the scene, the contacted objects have their 
opacity reduced, and then subsequently restored 
when contact ceases. For rendering systems sup¬ 
porting transparency, this has the effect of not only 
indicating contact, but also allowing the user to see 
the tool-object as it moves through the interior of 
the contacted objects. For rendering systems with¬ 
out transparency, the reduction in opacity appears 
instead as a reduction in luminance, since the per¬ 
centage of reflected light decreases with a decrease 
in opacity. In such environments, using the near- 
plane clipping as a cut-away can reveal the position 
of the tool-object as it moves through the interior. 
(Note that this visualization technique requires that 
of the tool-object’s attributes take precedence over 
the static-objects when forming the model.) Cut¬ 
aways also provide a means of inspecting the 
otherwise occluded interior parts of objects. 

A second use of collision detection is for 
operand selection in the execution of a user speci¬ 
fied set operation. The set operation is performed 
between the tool-object and only those objects with 
which it is in contact. This method is, from the 
user's perspective, simple and intuitive. Not having 
some method of selection is unacceptable since, for 
example, intersection between the tool-object and 
objects with which it is not in contact would annihi¬ 
late those objects. 

Those objects which are modified by a set 
operation will also have their center of mass and 
principal axes recomputed, after which their object 
definition is translated and rotated so that the cen¬ 
ter of mass and the principle axes form the object's 
local coordinate system while the instance reamains 
stationary. Thus affine transformations are always 
with respect to an objects "natural” coordinate sys¬ 
tem. This is very important if one wishes to main¬ 
tain an intuitive sense of the effects of affine trans¬ 
formations when applied to objects whose geometry 
has been significantly modified by a set operation. 

Picking provides a direct geometric method of 
tool-object selection, and implementing this with 
ray-casting makes the selection precise. Dragging of 
the tool-object is achieved by mapping the transla¬ 
tion vector defined in screen-space into model- 
space. This requires that the screen-space vector 
have a depth, which the mouse does not provide. 
For this, we use the screen-space depth value of 
the picked surface point. As a consequence the 
picked position always tracks the cursor exactly (as 
long as the view is not changed). A similar operation 
is placing , in which the user picks a source point on 
the tool-object and then picks a target point on any 
other object, while during the interim changing the 
view if necessary to make the target point visible. 


The source point is then translated to the target 
point. 

Finally, we use a variety of other minor tech¬ 
niques to improve usability. Tool-object translation 
and view rotation via mouse translation is always 
with respect to screen-space; so, for example, left- 
right mouse motion maps to horizontal motion in a 
plane parallel to the screen. We also adjust the view 
rotation rate as a function of window size so that 
the subjective sense of the rate remains constant. 
Symmetric scaling of the tool-object increases as a 
constant percentage of the current scale factor, 
while differential scaling changes by a constant step 
size independent of the current scaling (empirically 
ascertained preference). Delta changes generated 
by holding down buttons have a slight acceleration 
and are adjusted to compensate for varying frame 
update rates; otherwise they would be too slow for 
long update times and too fast for short ones. And 
shearing preserves basis vector length rather than 
volume; otherwise surface area increases un¬ 
boundedly even though the volume remains the 

same. 

Comparison to Buffer Algorithms 

The ability to transform and merge partitioning 
trees quickly has obviated the original restriction in 
which trees were only created during a pre¬ 
processing phase. Since affine transformations and 
perspective projections (or more generally d+1 
dimensional linear transformations) do not change 
a tree’s structure, off-line generation of good trees 

leads to efficient merging of instances of these trees, 

resulting in reasonably good new trees. Thus, we 
have only relaxed, not abandoned, the original pre¬ 
processing context, and this is a crucial point. For 
the original idea was to exploit time-invariant prop¬ 
erties of the geometry to reduce computation. Thus 
complex but affinely invariant objects can still have 
their trees created once as a pre-processing step. 
Interactively created objects can also have their 
trees improved by off-line reconstruction. 
However, this investment can now be reaped in a 
much less restrictive environment, since these 
trees can be transformed and merged. 

As the system described in this paper illus¬ 
trates, the pre-computation yields efficient set 
operations, collision detection, clipping, visible sur¬ 
face determination, transparency, anti-aliasing and 
picking. Let us compare these capabilities to the 
screen-space approaches for polyhedral objects 
which are extensions in the spirit of the z-buffer 
algorithm (as opposed to scan-line algorithms). In 
these methods, typically no pre-computation is 
utilized and each object is sampled independently. 
Screen-space evaluation of set operations requires 
a representation of the geometric model in every 
ID sub-domain defined by the projector for each 
pixel (or sub-pixel); this, of course, is the same 
methodology as ray-cast evaluation of CSG models. 
Assuming retention of the scan-conversion modus 
operandi, as opposed to ray-tracing, this requires 
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decomposing an object into a set of segments with 
attributes and identifier (for collision detection) 
instead of simply pixels. For convex primitives, this 
is not difficult, but for arbitrary polyhedra, no effi¬ 
cient scan-conversion type algorithm is known. The 
resulting segments must then be merged and eval¬ 
uated as defined by the CSG tree (note that set 
operations require knowing the value of an 
operand in every projector, not just those inter¬ 
secting the interior as would be discovered by 
scan-conversion). This will yield an ordered list of 
segments that can, for the purposes of trans¬ 
parency, be composited. The accuracy of this 
computation is affected not only by the sampling, 
but also by the rather considerable non-linear 
compression of the depth. This is in addition to the 
difference in accuracy of discrete screen space vs. 
continuous model space computations; a point of 
some importance in non-interactive geometric 
computation, such as interference detection and 
mass property calculations. Anti-aliasing requires 
all of this work to be performed at the sub-pixel 
level if a correct image is to be generated, adding a 
factor of 4 or 16 increase in the per pixel computa¬ 
tion. Clipping is O(n), although this can be reduced 
by an initial culling of objects by comparing their 
bounding box to the view-volume. However, clip¬ 
ping unculled objects is still O(n) and the view vol¬ 
ume is restricted to being convex, so that in a 
window system, subsets of the model occluded by 
another window would not be eliminated by clip¬ 
ping (however, solid clipping, i.e, cutaways, could be 
achieved with this proposed system). Picking has 
often been implement as an additional O(n) clipping 
step as opposed to our ~0(log n) ray-cast, although 
with this proposed schema, the necessary 
information would already be present in the buffer. 

Having done all this, the result is an image 
identical to the one generated using partitioning 
trees. However, evaluation of set operations would 
be repeated for every frame at every sub-pixel, 
even though an object created using set operations 
may be affinely invariant for an arbitrarily large 
number of frames (not just seconds, but possibly 
days or years). So let us assume that screen-space 
algorithms are not used for set operations. Then for 
collision detection and transparency, we still must 
have the ordered set of segments for each sub¬ 
pixel, since we are effectively computing the union 
of the objects. The comparative cost of this opera¬ 
tion suffers both from the absence of pre-process¬ 
ing acquired temporal-correlation and from the 
non-exploitation of spatial-coherence: instead of 
determining spatial relations between volumes, the 
same relation will be repeatedly computed for 
every ray that intersects those volumes (scan-line 
algorithms, of course, can exploit coherence). 
Parallelization of the buffer-based approaches is not 
necessarily a panacea, since partitioning tree algo¬ 
rithms can be parallelized as well. Only in circum¬ 
stances where the presupposition of affine invari¬ 
ance over a sufficient period of time is violated. 


which is commonly the case for cartoon like anima¬ 
tions, would the buffer-based approaches be 
arguably superior for polyhedral models. 
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Abstract 

We present two integer-only algorithms to be used in tan¬ 
dem for rendering cubic functions and parametric cubic 
curves with rational coefficients. Analysis of execution 
speed of existing algorithms shows that our algorithms 
will match or outperform other current algorithms. 
Furthermore, while other existing algorithms can only 
handle curves shaped by rational coefficients by 
introducing some approximation error, our algorithms 
always choose the best approximation. Curves may have 
to be split before rendering, because each algorithm only 
handles curves with slope in a certain range. When 
plotting parametric curves, our algorithms may require 
more bits of representation for some integer variables than 
other existing algorithms. 

Keywords: display algorithms, curve representations, 
parametric curves, raster graphics 

1, Introduction 

Computer scientists have been developing line and curve- 
rendering algorithms for over 25 years. Only recently, 
however, have efficient algorithms for the plotting of cu¬ 
bic curves begun to appear. This paper will propose and 
develop two fast, integer-only algorithms that can be used 
in tandem to render cubic curves with rational coefficients 
defined by the function 

y = (A n /Ad)x 3 + (B n /Bd)x 2 + (C n /Cd)x + (Dn/Dd)* (1) 

The algorithms are based on the midpoint method, 
described by Van Aken and Novak in [17] and below. 

2. History And Existing Algorithms 

J.E. Bresenham was the first to present a fast, integer-only 
line rendering algorithm in 1965 [1]. Research in line ren¬ 
dering since has seized on the periodic patterns shown by 


Bresenham's algorithm when viewed on a raster display as a 
means of improving algorithm speed [4,14], 

Algorithms for rendering circles began to appear in the 
1970s. Bresenham [2,3], Horn [7], and Mcllroy [12] have 
all presented algorithms. Later, algorithms for rendering 
ellipses were published [9,15,16], and more recently, algo¬ 
rithms for the plotting of parabolas and hyperbolas were 
presented [13,15,18]. 

Algorithms for the rendering of cubic curves have only 
begun to appear in the last few years. In [11], Klassen pre¬ 
sented two algorithms for rendering parametric cubic 
curves. First he identified the family of Bezier curves that 
are "worst-case”, meaning that they are most likely to 
cause overflow during calculation. If 2h is screen length or 
width, Klassen asserted that "worst case" curves would have 
the four Bezier control points [-/t,5A,-5/i,A] in at least one 
dimension, which would describe the one-dimensional 
parametric Bezier cubic 32 ht^ - 48 hfi + 18 ht - h. Klassen 
called such curves S curves. Klassen then presented his two 
algorithms and outlined their relative speed and overflow 
restrictions for worst-case curves. Algorithm A uses a 
fixed-point representation of curve coordinates, and thus 
incorporates an inherent level of error. However, it is fast 
and has a liberal overflow restriction. Algorithm B divides 
forward differences into integer and fractional parts, 
providing perfect accuracy. But it is slower than algorithm 
A, and can only take 1024 parametric steps if overflow is 
to be avoided with 32-bit words. Both algorithms allow 
arbitrary step size and do not restrict curve segments to 
certain slope octants. Both can be used with non-integer 
coefficients. However, use of such coefficients with 
algorithm B would eliminate its perfect accuracy. 

In [10], Klassen studied the use of these two algorithms 
with cubic spline curves. He envisioned the use of the 
algorithms with adaptive forward differencing [6,8], which 
dynamically adjusts step size as a curve is plotted. 
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Figure Is Elimination of the constant 
term can be compensated for by a 
translation. 

o o o 
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Figure 3: If Islopel < 0 and X and Y 
plotting directions are positive the 
candidate points are (X+1,Y) and 
(X+l, Y+l). 


Simultaneous to Klassen, Chang et al. [5] developed an 
algorithm similar to Klassen's algorithm B that also could 
be used with adaptive forward differencing. Differences 
between the two algorithms are minor. 

3. Preliminaries 

3*1. Elimination Of The Constant Term D n /Dd 

Since the last term (D n /Dd) in (1) does not change the 
shape of the curve, we can render the curve described by 
instead rendering the curve 

y = (A n /Ad)x 3 + (B n /Bd)x 2 + (C„/Cd)x (2) 

with a compensating translation (see figure 1). Note that if 
the (D n /Dd) rational coefficient is not an integer, transla¬ 
tion of the Y coordinate at plotting by roundfDn/Dd) alone 
will not necessarily produce the best approximation of the 
curve. In section 4 we will discuss a method of compensat¬ 
ing for this inaccuracy. 


A 



Figure 2: If the decison function is 
evaluated at a point above the curve, it 
is negative. Otherwise it is positive. 


Table 1. The candidate and midpoints used depend on 
curve slope and X and Y plot direction. Points are listed in 
clockwise order. 


YPlot 

Dir 

X Plot 
Dir 

ISlopel 

Candidate Points 

Midpoints 

+ 

+ 

<1 

(X+l.Y); (X+l.Y+l) 

(X+l.Y+1/2) 

+ 

+ 

>= 1 

(X+l.Y+l); (X.Y+1) 

(X+1/2.Y+1) 

+ 

- 

>=1 

(X. Y+l); (X-l.Y+l) 

(X-1/2.Y+1) 

+ 

- 

<1 

(X-l.Y+l); (X-l.Y) 

(X-l.Y+1/2) 

- 

+ 

<1 

(X-l.Y); (X-l.Y-l) 

(X-l.Y-1/2) 

- 

+ 

>=1 

(X-l.Y-l); (X.Y-l) 

(X-1/2.Y-1) 

- 

- 

>=1 

(X.Y-l); (X+l.Y-1) 

(X+1/2.Y-1) 

- 

- 

<1 

(X+l.Y-l); (X+l.Y) 

(X+l.Y-1/2) 


3.2. The Midpoint Method 

The midpoint method, described by Van Aken and Novak in 
[17], requires the incremental evaluation of a decision func¬ 
tion that indicates which of two candidate pixels should be 
chosen for rendering. If the equation for a curve is y = f(x), 
then the decision function has the form d(x,y) = f(x) - y. 
Notice that this function will have a different sign on each 
side of the curve f(x) (see Figure 2). 

Where do we evaluate this function? This depends on the 
slope of the curve. If -1 < f (x) < 1 and we are plotting in 
positive X and Y directions, then if we have just plotted the 
point (X,Y), the two candidate points for plotting are 
(X+1,Y) and (X+l,Y+l). (See figure 3. Table 1 shows a 
complete list of candidate points.) The midpoint method 
evaluates the decision function at the midpoint between the 
candidate pixels. In our example, this midpoint is 
(X+l,Y+l/2), and thus we evaluate d(X+l,Y+l/2). (See Fig¬ 
ure 4. Table 1 shows a complete list of midpoints.) We 
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(X.Y) 
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Figure 4: If Islopel < 0 and the X and Y 
plotting directions are positive, the midpoint 
is (X+l,Y+l/2). 

will call the decision function the "decision variable" when 
it is evaluated at a midpoint. 

Since the sign of the decision function d(x,y) corresponds 
to a specific side of f(x), the sign of decision variable 
d(X+l,Y+l/2) indicates the side of f(x) on which the mid¬ 
point lies, and also which of the candidate pixels lies 
closer to the curve being plotted, f(x). In figure 4, the sign 
of the decision variable is negative, so the lower candidate 
pixel is chosen for plotting. 

Once the next pixel is chosen and plotted, the decision 
function must be evaluated at the next midpoint to allow 
the plotting of the next pixel. In our example (figure 5), 
the appropriate decision variable would be d(X+2,Y+l/2). 

3.3 Forward Differencing 

Simple evaluation of the decision function at each succes¬ 
sive midpoint would be computationally expensive. 
Fortunately, there is a method of incremental function eval¬ 
uation, called forward differencing, which is uniquely suited 
to our needs. This method, which was known to Newton, 
involves the initialization of several difference values that 
may be added together to produce the value of a function at a 
certain point. These difference values are then themselves 
incrementally evaluated, to prepare for the next evaluation 
of the original function. Note that mulitiplication is only 
required for function and difference initialization. 
Furthermore, if all function coefficients are integers, no 
floating point addition is required. 

As an example, consider the simple function f(x) = 2x + 1. 
f(x+l) differs from f(x) only by the constant difference 
value 2. By successively adding 2 to an initial value for 
f(x), we could incrementally calculate the value of f(x) at 
integer intervals on X. For the higher-order function g(x) = 
x^, the binomial expansion g(x+l) = (x+l)^ = x^ + 2x + 1 
gives us the first-order difference value 2x + 1 for an integer 
interval. Since this difference value is also dependent on 
X, it must also be subjected to forward differencing, as 


o o o o 



(X.Y) (X+l.Y) 


0.000 
Figure 5: With this curve, (X+1,Y) would be plotted, 
and (X+2,Y+l/2) used as the next midpoint. 

already discussed. Thus the incremental calculation of g(x) 
= x^ would require two additions per integer interval. 

4. The RunRise Algorithm 

Let us first find the decision function d(x,y) for equation (2) 
when -1 < f(x) <1. In this case, the X component of f(x) 
is larger than the Y component: the curve "runs" faster than 
it "rises." We will label segments of f(x) where this 
condition holds true "RunRise." Since we will only make 
use of the sign of our decision function, we multiply our 
cubic function (2) by 2AdBdCd to increase efficiency by 
eliminating the floating point division calculations. To 
conserve space, we use the shorthand A{ = A n BdCd, B| = 
Bn^dCd. Q = C n AdBd. and Di = AdBdCd, in the rest of 
this paper and the equation below. We assume without loss 
of generality that Dj (and thus the denominators Ad» B<j, 
and Cd) are positive: 

2D;y = 2Ajx 3 + 2Bjx 2 + 2Cjx. (3) 

We will find it useful to plot in both positive and negative 
X and Y directions. Our direction-flexible decision variable 
d(x±l,y±l/2) is then 

2Ai(x±l) 3 + 2Bi(x±l) 2 + 2Cj(x±l) - 2Dj(y±l/2) (4) 

where ± is positive if we are plotting in a positive 
direction, negative otherwise. 

We must evaluate (4) incrementally as we plot the RunRise 
portion of f(x). To avoid computationally complex multi¬ 
plications, we will use forward differencing. The difference 
constant d<)y is the difference between d(x,y) evaluated at 
the "current" Y coordinate, and d(x,y) evaluated at the 
"next" Y coordinate: 

d()y = d(x±l,y±3/2) - d(x±l,y±l/2) 

= 2Dj(y±3/2) - 2Dj(y±l/2) 

= ±2Di. 
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This difference constant is used to update the decision vari¬ 
able when a "Y step” is made — that is, when the last 
plotted pixel differs from the previously plotted one in its 
Y coordinate. 

Because the decision variable (4) is a third-order function in 
X, updating it when an X step is made is more complex. 
The second order difference function d2(x) is the difference 
between the decision variable at the current X coordinate 
and the next X coordinate: 

d 2 (x) = d(x±l,y±l/2) - d(x,y±l/2) 

= 2Ai(x±l) 3 + 2Bi(x±l) 2 + 2Q(x±l) - 2Aix 3 

- 2BJX 2 - 2Cjx 

= 2Ai(x 3 ±3x 2 +3x±l) + 2Bj(x 2 ±2x+l) + 2Ci(x±l) 

- 2Ajx 3 - 2Bjx 2 - 2C;x 

= 2Aj(±3x 2 +3x±l) + 2Bj(±2x+l) ± 2Cj 
= ±6Ajx 2 + (6A,±4Bi)x + (±2Ai+2Bj±2Cj). 

d 2 (x) must itself be subjected to forward differencing. The 
first order difference function dj(x) is the difference 
between the value of d 2 (x) at the current and next X 
coordinates: 

di(x) = d 2 (x±l) - d 2 (x) 

= ±6A;(x±l) 2 + (6Ai±4Bi)(x±l) - ±6kAjx 2 

- (6Aj±4Bi)x 

= +6A i (x 2 ±2x+l) + (6Ai±4Bi)(x±l) - ±6Aix 2 

- (6Ai±4B{)x 

= ±6Ai(±2x+l) + (±6Ai+4Bj) 

= 12Aix + (±12Ai+4Bi). 

Finally, di(x) must be subjected to forward differencing. 
The difference between di(x) evaluated at the current and 
next X coordinates gives us constant 6q x : 

d()x =di(x±l)-di(x) 

= 12Ai(x±l) - 12Ajx 
= ±12Ai. 

Stepping one pixel at a time, and using the global declara¬ 
tions below, we can construct a direction-flexible 
algorithm for plotting the RunRise segments of cubic 
curves. Figures 6 and 7 show the core of this algorithm in 
C. For brevity's sake, we have removed the variable and 
function declarations (all variables are long integers). The 
initialization of Aj, Bi, Q and Dj is not shown — it is 
common to all curve segments. 

In its loop, the RunRise algorithm performs 4 additions for 
each step in X, 2 additions for each step in Y. This gives a 
cost of 

4alXEnd-Xl + 2alYEnd-Yl, 


where (X,Y) and (XEnd,YEnd) are the endpoints of the plot¬ 
ted curve segment, and a the cost of one addition operation. 

Earlier, we noted that compensating for the elimination of 
the non-integer constant with translation will not 
necessarily produce the best approximation of the curve. 
Adding an appropriately signed round(Di * ((D n /Dd) mod 
1)) to the initial decision variable will simulate a fractional 
Y step and improve accuracy. 

5. The RiseRun Algorithm 

Now let us find the decision variable needed when If (x)l > 1. 
In this case, it is the Y component of f(x) that is larger, so 
the curve will "rise” faster than it "runs." We will label 
segments of f(x) where this condition holds true 
"RiseRun." 

As is clear from table 1, the directi on-flexible midpoint 
decision variable is d(x±l/2,y±l). Expanded, this is 

8Aj(x±l/2) 3 + 8Bj(x±l/2) 2 + 8Q(x±l/2) - 8Dj(y±l). (5) 

We have used the constant 8Dj rather than 2Dj in (5) to 
allow the integer performance of the half step x±l/2. (5) 
reduces as follows: 

= 8A;(x 3 ±3/2x 2 +3/4x±l/8) + 8Bi(x 2 ±x+l/4) 

+ 8Ci(x±l/2) - 8Di(y±l) 

= 8Aix 3 + (±12Aj+8Bi)x 2 + (6Ai±8Bi+8Cj)x 
+ (±Ai+2B;±4Ci) - 8Di(y±l) 

We use this all-integer equation to initialize our decision 
variable. The following forward stepping increments allow 
us to update that decision variable: 

d 2 (x) = d(x±3/2,y±l) - d(x±l/2,y±l) 

= 8Aj(x±3/2) 3 + 8Bj(x±3/2) 2 + 8Cj(x±3/2) 

- 8Ai(x±l/2) 3 - 8Bj(x±l/2) 2 - 8Q(x±l/2) 

= 8 A j (x 3 ±9/2x 2 +27/4 x±27/8) + 8Bi(x 2 ±3x+9/4) 

+ 8Cj(x±3/2) 

= 8Aj(x 3 ±3/2x 2 +3/4x±l/8) - 8Bj(x 2 ±x+l/4) 

- 8Cj(x±l/2) 

= ±24Ajx 2 + (48Aj±16B0x + (±26Ai+16Bi±8Ci) 

dl(x) = d 2 (x±l) - d 2 (x) 

= ±24Aj(x±l) 2 + (48A;±16Bj)(x±l) - ±24Ajx 2 

- (48Ai±16Bi)x 

= ±24Ai(x 2 ±2x+l) + (48Ai±16Bi)(x±l) -±24AiX 2 

- (48Ai±16B0x 

= ±24Ai(±2x+l) + (±48Ai+16Bj) 

= 48Aix + (±72Ai+16Bj) 
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XDec2Constl 

*/ 


Ai«l; 


Bi«l; 


XDec2Const2 
*/ 

XDeclConst = XDec2Const2<<l; 
*/ 

XDec2Const3 * Ci<<l; 

*/ 

DecConst = Di; 

YDecConst = DecConst<<l; 


/* 

Used 

to 

init 

2nd 

order 

diffc 

function 

/* 

Used 

to 

init 

diffc constant * 

/ 

/* 

Used 

to 

init 

2nd 

order 

diffc 

function 

/* 

Used 

to 

init 

1st 

order 

diffc 

function 

/* 

Used 

to 

init 

2nd 

order 

diffc 

function 

/* 

Used 

to 

init 

decision variable */ 


/* Difference constant for a Y step */ 


Tempi = XDec2Constl*X; 

*/ 

if (X < XEnd) { 

XStep = 1; 

XDecO = XDecOConst; 

XDecl = (Templ«2) + (Templ<<l) 

+ XDecOConst + XDeclConst; 

XDec2 = ((Templ<<l) + Tempi 

+ (XDecOConst»l) + XDeclConst) *X 
+ XDec2Constl + XDec2Const2 + XDec2Const3; 

} else { 

XStep = -1; 

XDecO = -XDecOConst; 

XDecl = (Templ<<2) + (Templ<<l) 

- XDecOConst + XDeclConst; 

XDec2 = (-(Templ<<l) - Tempi 

+ (XDecOConst»l) - XDeclConst)*X 

- XDec2Constl + XDec2Const2 - XDec2Const3; 

} /* end if */ 

Dec = ((Tempi + XDec2Const2)*X + XDec2Const3)*X; 
if (Y < YEnd) { 

YStep * 1; 

Dec = Dec + XDec2 - YDecConst*Y - DecConst; 

} else { 

YStep = -1; 

XDecO » -XDecO; 

XDecl = -XDecl; 

XDec2 = -XDec2; 

Dec = -Dec + XDec2 + YDecConst*Y - DecConst; 

} /* end if */ 


/* Used several times to save multiplies 

/* If plotting in positive X direction */ 
/* Set X increment */ 

/* Difference constant for an X step */ 

/* Init 1st order diffc function */ 

/* Init 2nd order diffc function */ 


/* If plotting in negative X direction */ 
/* Set X increment */ 

/* Diffc constant for an X step */ 

/* Init 1st order diffc function */ 

/* Init 2nd order diffc function */ 


/* Init decision variable */ 

/* If plotting in positive Y direction */ 
/* Set Y increment */ 

/* Final decision variable initiation */ 
/* If plotting in negative Y direction */ 
/* Set Y increment */ 

/* Negate X differences */ 


/* Final decision variable initiation */ 


Figure 6: RunRise algorithm initialization. (x,Y) and (XEnd t YEnd) are segment endpoints. 


if (X == XEnd) 

{LinPlot(X,Y,XEnd,YEnd); 
else 

for (X=X; X<=XEnd; X=X+XStep) { 

Plot(X,Y); 

XDec2 = XDec2 + XDecl; 

XDecl = XDecl + XDecO; 
if (Dec >0) ( /* perform Y step */ 

Y = Y + YStep; 

Dec = Dec + XDec2 - YDecConst; 

} else Dec = Dec + XDec2; 

} /* end for */ 


/* If degenerate curve, */ 

I* Plot a line */ 

/* Otherwise, */ 

/* For each X in the curve */ 

/* Plot a point */ 

/* Update the 2nd order diffc */ 

/* Update the 2st order diffc */ 

/* If must perform Y step */ 

/* Adjust Y accordingly */ 

/* Update the decision var accordingly */ 
/* If no Y step, update dec var accdgly */ 


Figure 7: RunRise algorithm loop. (X»Y) and (XEnd,YEnd) are curve segment endpoints. 
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d()x =di(x±l)-di(x) 

= 48Ai(x±l) - 48Aix 
= ±48Aj 

do y = 8Dif(x±l/2,y±2) - 8Dif(x±l/2,y±l) 

= 8Di(y±2) - 8Di(y±l) 

= ±8Di. 

Figures 8 and 9 show the RiseRun algorithm. Again, vari¬ 
able declarations and global initializations are not shown. 

The RiseRun algorithm, like the RunRise algorithm, per¬ 
forms 4 additions for each step in X, 2 additions for each 
step in Y. Thus algorithm cost is 

4tflXEnd-Xl + 2fllYEnd-Yl. 

Note, however, that lYEnd-Yl will in this case always be 
greater than IXEnd-Xl. 

6. Use Of The Algorithms In Tandem 

Because each of the RunRise and RiseRun algorithms only 
function for curve segments that have slope within a 
certain range, plotting a cubic curve with these algorithms 
will typically require that the curve be split into segments 
which satisify the algorithms’ slope range requirements. 
Algorithm initialization must then be performed for each 
curve segment, increasing overhead. 

The actual curve splitting itself will also increase overhead. 
Since each algorithm takes as input only function 
constants and segment endpoints, splitting essentially 
involves the location of appropriate segment endpoints. 
For the unoptimized algorithms, these endpoints will be 
the points on the curve at which the conditions lf(x)l = 1 
and f(x) = 0 hold true. Locating these points will involve 
floating point arithmetic. The appropriate algorithm must 
then be called for each segment. In the worst case, a curve 
will have to be split into seven such segments. However, 
use of the algorithms with spline and Bezier curves would 
typically require the splitting of curves into only two or 
three segments. 

7. Overflow 

Care must be taken to avoid overflow when using these 
algorithms. Below, we provide overflow analysis for the 
initialization and looping portions of each of the algo¬ 
rithms. 

We begin with the RunRise algorithm. During initializa¬ 
tion, the largest intermediate value that must be handled is 
the first initialization of the decision variable, ((Tempi + 
XDec2Const2) *X + XDec2Const3) *X. This represents 
2A'ix? + 2Bjx^ + 2Qx. For ease of algorithm use and anal¬ 
ysis, we define i such that each of the coefficients IAjI, IBjl, 
IQI and IDjI are less than i. The screen space variable Ixl 


has the maximum value w/2, where w is screen width. Thus 
in the worst case, the decision variable will take on the 
intermediate value liw(w^/4 + w/2+ 1)1. To represent this 
value without overflow, the condition 

Iiw(w 2 /4 + w/2+ 1)1 < 2 bitsA , 

where bits is the number of bits available for 
representation, must hold true. As an example, it is 
reasonable to expect screen width w to be less than 1280. 
We then have 

1280/ (320 2 + 640+ 1) < 2 W,J ‘ 1 
1280/(103041) <2 Wte ’ 1 

III < 2^ ,,5_1 /131892480. 

If a 32-bit word is to be used for representation, bits = 32 
and 1/1 must be less than or equal to 16. Clearly, this is too 
restrictive. If instead 64 bits are used during initialization, 
we have 1i I < 3.496549627e+10. Considering that 
|_log(3.496549627e+10)J is 35, this is quite reasonable. 

The algorithm changes the state of 5 variables while loop¬ 
ing: X, Y, Dec, XDecl, and XDec2. X and Y are screen 
space variables, and thus will not overflow unless any 
screen coordinate is larger than in magnitude — an 

unlikely event, given the present state of raster 
technology. XDecl represents the difference function 
di(x) = d2(x±l) - d2(x), and XDec2 the difference function 
d2(x). By definition, then, XDec2 will always be larger 
than XDecl in magnitude. 

Dec represents the decision function d(x,y) evaluated at the 
midpoint (X±l,Y±l/2). When curve slope lf(x)l < 1 (the 
RunRise case), the midpoint method guarantees that this 
point will be a distance d of at most 1/2 in Y from the point 
(X±l,f(X±l)) (see again figure4 and table 1). Thus we have 

d = lf(X±l) - (Y±l/2)! < 1/2. 

Scaling by 2D; gives 

2Dj d = !2Dif(X±l) - 2Dj(Y± 1/2)1 < Dj. 

Note now that 2Djf(X±l) - 2Dj(Y±l/2) is equivalent to the 
decision variable (4). Thus we have ld(x±l,y±l/2)l = Id eel 
SDj. 

XDec2 represents the difference function d2(x) = 
d(x±l,y±l/2) - d(x,y±l/2) = g(x±l) - g(x), where g(x) is the 
function (3). We define the difference d2'00 = d2(x)/2Dj = 
f(x±l) - f(x), where f(x) is the function (2). In the RunRise 
case, lf(x)l < 1, which implies that d 2 ’(x) = lf(x±l) - f(x)l < 
1, and then it follows that IXDec2l = Id2(x)l < 2Dj. This 
allows us to conclude that representing the RunRise 
looping variables requires only the fulfillment of the 
almost trivial inequality 
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XDecOConst = (Ai«5) + <Ai«4); 
XDec2Const = XDecOConst»l; 

XDeclConst = XDec2Const + <Bi<<4); 
YDecConst = Di«3; 

XDecConst ® Ci<<3; 

Tempi = (Ai<<3)*X; 
if (X < XEnd) { 

XStep = 1; 

XDecO = XDecOConst; 

XDecl = (Tempi<<2) + (Templ<<l) 

+ XDecOConst + XDeclConst; 

XDec2 = ( (Templ«l) + Tempi 

+ XDeclConst + XDec2Const)*X 
+ XDeclConst + (Ai<<l) + XDecConst; 
Dec = ((Tempi + (XDeclConst>>l))*X 
+ (XDeclConst»1) - (XDec2Const»2) 
+ XDecConst) *X + Ai + (Bi«l) 

+ (Ci«2); 

) else { 

XStep = -1; 

XDecO = -XDecOConst; 

XDecl = (Templ<<2) + (Templ<<l) 

- (XDecOConst«l) + XDeclConst; 
XDec2 = {-(Templed) - Tempi 

+ XDecOConst + XDec2Const 

- XDeclConst)*X + XDeclConst 

- XDecOConst - (Aieel) - XDecConst; 
Dec = ((Tempi + (XDeclConst>>l) 

- XDec2Const) *X - (XDeclConst»l) 

+ XDec2Const - (XDec2Const>>2) 

+ XDecConst)*X - (Ciee2) + (Bieel) 

- XDecConst; 

} /* end if */ 

if (Y e YEnd) { 

YStep = 1; 

Dec = Dec - YDecConst*Y - YDecConst; 

} else { 

YStep - -1; 

XDecO = -XDecO; 

XDecl - -XDecl; 

XDec2 = -XDec2; 

Dec = -Dec + YDecConst*Y - YDecConst; 

) /* end if */ 


/* Used to init diffc constant */ 

/* Used to init 2nd order diffc fetion */ 
/* Used to init 1st order diffc fetion */ 
/* Difference constant for a Y step */ 

/* Used to init lst/2nd ord diffc fets */ 

/* Used to save multiplies */ 

/* If plotting in positive X direction */ 
/* Set X increment */ 

/* Difference constant for an X step */ 

/* Init 1st order diffc function */ 

/* Init 2nd order diffc function */ 


/* Init decision variable */ 


/* If plotting in negative X direction */ 
/* Set X increment */ 

/* Diffc constant for an X step */ 

/* Init 1st order diffc function */ 

/* Init 2nd order diffc function */ 


/* Init decision variable */ 


/* If plotting in positive Y direction */ 
/* Set Y increment */ 

/* Final decision variable initiation */ 
/* If plotting in negative Y direction */ 
/* Set Y increment */ 

/* Negate X differences */ 


/* Final decision varaible initiation */ 


Figure 8: RiseRun algorithm initialization. (X,Y) and (XEnd,YEnd) are segment endpoints. 


if (Y == YEnd) 

{LinPlot(X,Y,XEnd,YEnd) ; 
else 

for (Y=Y; Y<=YEnd; Y=Y+YStep) { 

Plot (X,Y) ; 

if (Dec < 0) ( /* perform X step */ 

X = X + XStep; 

Dec - Dec + XDecZ - YDecConst; 
XDec2 = XDec2 + XDecl; 

XDecl = XDecl + XDecO; 

} else Dec = Dec - YDecConst; 

} /* end for */ 


/* If degenerate curve, */ 

/* Plot a line */ 

/* Otherwise, */ 

/* For each Y in the curve */ 

/* Plot a point */ 

/* If must perform X step */ 

/* Adjust X accordingly */ 

/* Update the dec va.ria.bLe accordingly */ 
/* Update the 2nd order diffc */ 

/* Update the 1st order diffc */ 

/* If no X step, update dec var accdgly */ 


Figure 9: RiseRun algorithm loop. (X,Y) and (XEnd,YEnd) are curve segment endpoints. 
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2D, < 2 bits ' 1 . 

Then if bits - 32, we must have Dj < 2 3 ® to loop in the 
RunRise algorithm without overflow. 

The overflow analysis of initialization for the RiseRun 
algorithm is similar to the RunRise initialization analysis. 
The largest intermediate value calculated during initializa¬ 
tion is the partial sum of the decision variable 8A|x^ + 
(±12Ai+8Bi)x 2 + (6Aj±8Bi+8Ci)x + (±Ai+2Bi±4Ci). Here 
the worst case value is li (w^ + 5w2 + 12w + 7)1, giving us 
the inequality 

1/ (w 3 + 5w 2 + 12w + 7)1 < 2 bilsA 

if overflow is to be avoided. Since this inequality is clearly 
more restrictive than the inequality required for RunRise 
initialization, the use of more than 32 bits is appropriate. 

Because the condition lf(x)l < 1 does not hold for RiseRun 
curves, overflow analysis of the RiseRun looping section 
will differ significantly from the similar RunRise analysis. 
The unoptimized RiseRun algorithm changes the same five 
variables as the unoptimized RunRise algorithm. We can 
again conclude that the overflow restrictions required by 
Dec and XDec2 will most seriously affect program utility. 

In the RiseRun algorithm, d2(x)/2Dj = f(x±3/2) - f(x±l/2). 
But since If (x)l > 1, we cannot conclude that lf(x±3/2) - 
f(x±l/2)l < 1 and IXDec2l < 8D[. Theoretically, the differ¬ 
ence lf(x±3/2) - f(x± 1/2)1 could be infinite. Clearly, how¬ 
ever, a curve that fulfilled such a condition would have infi¬ 
nite slope, and f(x) would then simply describe a vertical 
line. In fact, any curve segment that fulfills the condition 
XEnd = X would for our purposes be a vertical line, and 
would be most quickly rendered by a primitive line plotting 
routine. Since our RiseRun algorithm captures such cases, 
we can guarantee that XEnd * X and thus that lf(x±3/2) - 
f(x± 1/2)1 < h t where h is screen height. This allows us to 
conclude that IXDec2l = d2(x) = l8Dif(x±3/2) - 8Dif(x±l/2)l 
< 8Di/i, and gives us the restriction 

8Di h £ 2 bits ' 1 . 

The RiseRun decision variable d(x±l/2,y±l), like the 
RunRise decision variable, is proportional to a distance. 
We'll call this distance d ' = f(X±l/2) - (Y±l). However, 
because lf(x)l £ 1 for RiseRun curves, \d'\ has the wider 
range [0,/iJ, which implies that ld(x±l/2,y±l)l has the 
proportional range [0,8D|A], and that Id eel < 8Dj/i. Thus 
the above overflow restriction for XDec2 also applies to 
Dec. 

If bits = 32 and h = 1024, we have 
2 13 Di <2 31 
D; < 2 18 . 


8. Algorithm Use With Parametric Curves 

While the algorithms presented in this paper were designed 
for the direct plotting of cubic functions, they can be used, 
with slight modifications, to plot parametric curves. 

Typically, parametric curves are plotted by making both 
the X and Y coordinates functions of a parameter f, which 
may take on values in the range [0,1]. Our algorithms 
could be used to plot such curves by running them twice: 
once to obtain X coordinates, and once to obtain Y 
coordinates. In both cases, the algorithm variable X would 
contain the current value of the parameter t. The variable Y 
would contain an X or Y coordinate. Below, we present 
overflow analysis only for the parametric equation x(f). 
For overflow restrictions for y(f)i simply substitute h for 
w. 

Since our algorithms can only use a step size of 1, X 
cannot, like t, take on values only in the range [0,1]. 
Instead, we could let it take on integer values in the range 
[0,nj, where the even number n is the number of parametric 
steps desired. If we have the parametric equation 

x(f) = At 3 +B* 2 +0, (6) 

/ in [0,1], the equation 

x(0 = A (t' 3 /n 3 ) + B(r' 2 /n2) + C(f'/n), 

t* in [0,n], would describe the same coordinates. Thus the 
parametric version of the RunRise decision variable (4) is 

Aj(/'±1) 3 + Bi««'±l) 2 + Civ'll) - D i « 3 (X±l/2). 

Note that we have not scaled the parametric decision vari¬ 
able by 2 because we know that n is an even number. The 
RunRise parametric differences are: 

d 2 (0 = Aj(f'±l) 3 + Bi«(f'±l) 2 + Cj«V±l) - Ait' 3 - 
Bjrtf' 2 

- C \r?l' 

= Ai(/' 3 ±3/' 2 +3/'±l) + Bi«(f' 2 ±2/'+l) + Ci« 2 (/'±1) 

- Aj/' 3 - Bjnf' 2 - Cjn 2 t’ 

= Ai(±3/' 2 +3/'±l) + B,rt(±2/'+l) ± Cjn 2 
= ±3Aj/' 2 + (3Aj±2Bj«)t' + (±Aj+Bi«±Cjn 2 ) 

di(0 = ±3Aj(r'±l) 2 + (3Ai±2Bi«)(t'±l) - ±3Ait' 2 

- (3Aj±2Bi«)f 

= ±3Ai(±2f+l) ± (3Ai±2Bi*) 

= 6Aj/ + (±6Ai+2Bjn) 

do/» = ±6A] 

For the parametric versions of our algorithms, we only pre¬ 
sent overflow restrictions for the looping sections. 
Analysis for the parametric RunRise variables XDec2 and 
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Dec is quite similar to the analysis for the identically 
named non-parametric RunRise variables, and gives the 
overflow restriction 

Djn J < 2 pits-l. 

If bits = 32 and n = 256, we have Dj £ 128. 

Remember that /' refers to a step number rather than an 
absolute screen location. This enables us to use a low value 
of n, 128, and still know that the algorithm will function 
over the entire screen. If we were to make n smaller, the 
upper limit for IDjl would increase. If we had to plot a curve 
segment that required more than n steps, we could simply 
split the curve into several sub-segments of a more 
managable size. 

Since many parametric curves interpolate or are controlled 
by points chosen on a computer screen, it is often the case 
that the coefficients A, B, C, and D in (6) are integers, not 
rational. In such cases, A{, Bj and C{ are equal to A n , B n , 
and C n . Most important, however, is the observation that 
Dj = 1. In such cases, our RunRise overflow restriction 
above becomes 

n 3 < 2 bits -l. 

If bits = 32, we have n < 1290. 

The parametric version of the RiseRun decision variable (5) 
is 

8Ai(t'±l/2) 3 + 8Bin(t'±l/2) 2 + 8Cjn 2 (/’±l/2) - 
8Di« J (X±l). 

The RiseRun parametric differences are: 
d 2 (0 = ±24Ajt' 2 + (48A i ±16B i «)/' 

+ (±26Ai+16Bi«±8Cin 2 ) 
di(0 = 48Ajf' + (±72Aj+16Bin) 
do/* = ±6Ai 

The overflow restriction for the RiseRun parametric case is: 
8Djn J w< 2 i,,Vs - 1 . 

If bits = 32, w = 1280, and \D[\ < 64 we have 
lODi/r^ 16 <S2 31 

n 3 <2 15 /10, 

giving n < 14. 

If Di = 1, bits = 32 and w = 1280, we have 
10n J 2 10 £2 32 

n 3 < 2 22 /10, 

giving n < 74. 


9 . Algorithm Comparison And Evaluation 

In this section, we compare the RunRise and RiseRun algo¬ 
rithms as they would be used with parametric curves with 
the algorithms A and B presented by Klassen in [11]. Table 
2 shows the operation costs and the overflow restrictions 
associated with Klassen's algorithms and our algorithms. 
We have assumed that a barrel shifter is available. 
Furthermore, while we generally follow Klassen's practice 
of weighing each branch of a conditional statement 
equally, we have made an exception for the branches in our 
algorithms' main loops. In particular, since the RunRise 
algorithm plots curve segments with absolute slope less 
than one, we have assumed (rather conservatively) that the 
algorithm will make 3 X steps for every 2 Y steps, and thus 
that the if statment in figure 7 will be true only 40% of the 
time. By similar logic, we assume that the i f statment in 
figure 9 will be true only 40% of the time. As curve slope 
nears zero or infinity, these percentages decrease, and 
algorithm performance improves. 

Clearly, the main loops of both the RunRise and RiseRun 
algorithms use less operations than Klassen's algorithm B, 
and are comparable in cost to Klassen's algorithm A. 
During initialization, our algorithms use no expensive 
divide operations, and use about half the number of add and 
multiply operations used by Klassen's algorithms. It 
should be noted, however, that while our algorithms require 
that a cubic curve be split into segments, both of Klassen's 
algorithms A and B do not depend on slope. Thus, 
initialization for the RunRise and RiseRun algorithms will 
in practice require slightly more addition and multiply 
operations than Klassen's algorithms, as well as several 
floating point calculations. 

Klassen's algorithm A has by far the most liberal overflow 
restriction -- it is linear in n, the number of parametric 
steps, and w, screen width in pixels. The RunRise algo¬ 
rithm and Klassen’s algorithm B both have restrictions that 
are cubic in n, with the RunRise algorithm allowing twice 
as many parametric steps as algorithm B. The RiseRun 
overflow restriction, however, is cubic in n and linear in w. 
This severely restricts the RiseRun algorithm's utility if 
only 32 bits are available. 

It should be noted that the overflow restriction shown for 
Klassen's algorithms guarantee that both initialization and 
looping will be accomplished without overflow. The 
restrictions shown for the RunRise and RiseRun algorithms 
guarantee only that looping will be accomplished without 
overflow. Initialization of our algorithms requires many 
more bits for representation than does initialization for 
Klassen's algorithms (with the exception of algorithm A's 
extended precision divide (xdiv) operation). 

Klassen's algorithm A uses a fixed point approach, and thus 
incorporates an inherent level of error not present in the 
other algorithms. Both of Klassen's algorithms can be 
used with rational coefficients, but doing so would require 
floating point calculation, increase error in algorithm A, 
and introduce error into algorithm B. Our algorithms 
remain perfectly accurate even with rational coefficients. 
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Table 2. A comparison of Klassen’s algorithms from [11] with the algorithms presented in this paper 
as used for plotting parametric curves with integer coefficients, n is the number of parametric steps 
made, w is the width of the screen in pixels, bits is the number of bits used to represent algorithm values, 
Z is the number of bits of fractional precision. 
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2 

22 

5 

0 

0 

16 

2 

14 

8 wn 3 < 2 bitsA 

Klassen's A 

3 

0 

3 

3 

17+3Z 

12 

1 

5 

2 

z 

28+2Z 

46wn < 2^ lts ' * 

Klassen’s B 

10 

3 

0 

10 

40 

12 

5 

0 

0 

6 

33 

2n 3 < 2 bi,s - 1 


If non-parametric curves are being plotted, overflow 
restrictions for our algorithms improve: the RunRise 
algorithm requires only that the product of the rational 
denominators Dj be less than and the RiseRun 

algorithm is similar, but is linear in screen width. 
Overflow restrictions for Klassen’s algorithms in such a 
case will not show such a significant improvement. 

In summary, Klassen’s algorithms make efficient use of 
available bits, but at the price of algorithm speed or accu¬ 
racy. Our algorithms require more representational bits, 
but are faster and more accurate. We believe that if word 
size is 64 or larger, or non-parametric curves are being 
rendered, our algorithms are clear winners. 

10. Conclusions And Future Work 

We have presented integer-only algorithms that allow fast, 
accurate plotting of cubic curves. Analysis shows that 
using these algorithms to plot parametric curves may 
require more representational bits than already existing 
algorithms. But if such bits are not at a premium, or non- 
parametric curves are being plotted, our algorithms are the 
algorithms of choice. 

We plan to explore further the use of these algorithms with 
parametric curves, spline curves, and Bezier curves. In par¬ 
ticular, we would like to explore the use of these algorithms 
with adaptive forward differencing. 
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Abstract 

Certain classes of algebraic curves and surfaces admit both 
parametric and implicit representations. Such dual forms are 
highly useful in geometric modeling since they combine the 
strengths of the two representations. We consider the problem 
of computing the rational parameterization of an implicit curve 
or surface in a finite precision domain. Known algorithms for 
this problem are based on classical algebraic geometry, and 
assume exact arithmetic involving algebraic numbers. In this 
work, we investigate the behaviour of published parametriza- 
tion algorithms in a finite precision domain and derive succint 
algebraic and geometric error characterizations. We then in¬ 
dicate numerically robust methods for parameterizing curves 
and surfaces which yield no error in extended finite precision 
arithmetic and alternatively, minimize the output error under 
fixed finite precision calculations. 

Keywords: curves and surfaces, geometric modeling, nu¬ 
merical methods, computational algebraic geometry. 

Introduction 

Algebraic curves and surfaces are the most common represen¬ 
tations for curved objects in geometric modeling. Algebraics 
satisfy polynomial equations, usually with rational coefficients. 
A rational algebraic curve or surface is one whose points can 
be represented as rational functions in some parameters. 

Each form has certain benefits and drawbacks. The parametric 
form is better for rapid display and interactive control; the 
implicit form defines a half-space naturally and is suited for 
modeling. The class of all algebraics is also much larger than 
the class of rational algebraics. Dual forms can have the best 
of both worlds. 

Mathematical techniques from algebraic geometry have re¬ 
cently been applied to the problem of converting between the 
two forms. While implicitizing a parametric curve or surface 
is always possible, the converse (rational parameterization) is 
not always possible. That is, all algebraics are not rational. 


However, important classes of curves and surfaces are rational 
and algorithms for their rational parameterization based on al¬ 
gebraic geometry have been given in [1],[2],[3], [4],[18], and 
some will be analyzed here. There is also a numerical method 
due to Jacobi which works by iteratively converting a conic or 
quadric to standard form (see [11]). 

Functionally, rational parameterization takes one implicit equa¬ 
tion in n variables, and for each implicit variable returns a 
rational function in n — 1 parameters. Since the rational func¬ 
tions have a common denominator, the output can be viewed 
as consisting of n + 1 polynomials. 

While the input implicit equation is assumed to have ratio¬ 
nal coefficients, the output polynomials may require algebraic 
number coefficients, which are (informally speaking) roots of 
polynomials, such as y/2. The algorithms based on algebraic 
geometry assume exact computations. While techniques exist 
for manipulating algebraic numbers exactly, they are expen¬ 
sive. In this work, we consider parameterization algorithms in 
a finite precision domain. 

This paper is organized as follows. We choose a finite pre¬ 
cision numerical domain and explain our general approach to 
rederiving a parameterization algorithm to work in this domain. 
First, we analyze algorithms for conics and quadrics, and then 
analyze an algorithm for singular cubic curves. The error in 
each algorithm is described algebraically. We then use the al¬ 
gebraic error analysis to derive simple geometric error bounds 
for conics and quadrics. Finally, we consider singular cubic 
parameterization from another standpoint, showing that they 
can in fact be parameterized exactly using only rational arith¬ 
metic. Finally, we conclude by briefly discussing extensions 
of this approach, e.g. to cubic surfaces. 

Approach and Numerical Model 

To examine this problem when exact arithmetic is not allowed, 
we focus on the use of algebraic numbers. We stop short of 
allowing floating-point arithmetic; instead, the algorithms will 
use rational arithmetic throughout Algebraic numbers will be 
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approximated by rationals. This allows us, as a first study, to 
isolate the effects of the error caused by rational approximations 
to algebraic numbers. 

Recall that parameterization algorithms take a polynomial with 
rational coefficients as input, and output several polynomials. 
The algorithms can be restructured so that each coefficient of an 
output polynomial is given as a formula in the (symbolic) input 
coefficients, and some additional symbols. Every algebraic 
number required by the parameterization will be represented 
by one symbol in the formulas. 

If the algebraic numbers themselves are substituted for their 
symbols into the output formula, the output will be exact 
However, we only allow rational approximations to algebraic 
numbers. Substituting these numbers will yield only an approx¬ 
imate output This output will converge to the exact one as the 
rational approximations converge to the algebraic numbers. 
Thus, given a certain precision to which algebraic numbers 
are to be rationally approximated, and a bound on the size 
of the rational input coefficients, one can calculate from the 
formulas a bound on the rational output coefficients, and hence 
finally a bound on the precision required to carry out the entire 
calculation, if fixed finite precision is desired. 

Our approach to restructuring the algorithms consists of exam¬ 
ining them step by step, and eliminating from each step every 
subexpression that must vanish if exact arithmetic was used. 
As it turns out, there are two benefits of this approach: it is 
often possible to carry through the computation so that the out¬ 
put is expressed as a formula in the input, and error formulas 
are easily derived. While this method seems to work very well 
for parameterization algorithms, its applicability in other set¬ 
tings is likely to be limited, where repeated computations with 
algebraic numbers may be required. 

Conic Parameterization 

We restructure the algorithm in [1] for conic parameterization. 
The algorithm is given for conics in homogeneous form; this 
allows the use of both projective and affine transformations. 
The algorithm is then analyzed for the error in its output when 
approximations are used for algebraic numbers. 

Given the equation of a conic plane curve, parameter functions 
for the curve are derived. The parameter functions are given 
as closed form formulas in the parameter U the coefficients of 
the curve, and the coordinates of a point on the curve. 

INPUT. An irreducible conic curve given by f(x f y) = <i 2 ojf 2 4 
auxy + aoix 2 + a\oy 4 ao\x 4 aoo = 0. 

OUTPUT. Rational functions (x(t),j/(t)) of degTee at most 
two, such that f(x(t ), y(t)) = 0 . 

ALGORITHM. 

1. Homogenize the conic. This yields the homogeneous equa- 
tion F{X, Y, W) = anY 2 + a u XY + aaiX 2 + a l0 YW + 
aoiXW + aaoW 2 = 0. If the X 2 , Y 2 or W 2 term is missing 
from the conic’s equation, then it will be linear in the cor¬ 
responding variable, and can be immediately parameterized. 


Compute quadratic polynomials X(<),y(<) and W(t) such 
that F(X , y, W) — 0 , and go to step 4. 

2. If all squared terms are present, apply a linear transformation 
to cancel one of these terms. 

3. Parameterize the transformed conic, and apply the inverse 
transformation to the parameterization; yielding three quadratic 
polynomialsX(f), Y(t) and W(t) such thati^X, Y t W) = 0. 

4. The parameterization for the affine conic is then given by 

x(<) = J£T(t)/Pr(t) >y (t) = 

TRANSFORMATIONS. If all three squared terms are present, 
then any one of the following three transformations may be 
used in step 2 of the conic parameterization algorithm. The 
transformations to cancel X 2 and Y 2 are more general than 
that for W 2 . Hence we explain the X 2 case in most detail (the 
Y 2 case is similar and omitted). 

• To cancel the X 2 term, use the transformation 

X = bX i 

y = cXi + Yi (i) 

W = dX 1 4 Wi 

where ( 6 , c, d) are the homogeneous coordinates of some point 
on the curve. For the transformation to be well-defined, b must 
be non-zero. Then, if d ^ 0, the transformation is affine; other¬ 
wise it is projective. Since proportional projective coordinates 
represent the same point, we make the restriction d = 0 or 
d = 1. If d = 0, then we should also make a restriction 6=1 
or c = 1 ; since 6 ^ 0 is required for the transformation to be 
well-defined, we will restrict 6 = 1 in this case. 

Transforming F yields a new conic curve with implicit equation 

Fi(X u Y u Wi) = FibXucXi + YudXi + Wt) 

= F(b,c > d)X^ + F 2 (X u Yi,W l ) 

Since the subexpression F(b, c, d)X 2 must vanish, we only 
need to parameterize 

F 2 (Xx,Y u W x ) = (a 10 d 4 2 a 2 oc + anb)X x Yi 4 
(2aood 4 aioc 4 ao\b)X\Wi 4 
« 20 Y\ -|- axoYiWx 4 - aooW 2 = 0 

The curve F 2 = 0 passes exactly through the point (1,0,0) 
and can be parameterized by intersecting it with the pencil of 
lines Yx = tW\ which pass through this point, yielding 

Xi(*) = amt 2 4- aio< 4 aoo 

yi(i) = —(aiod 4 <*116 4 2a2oc)i 2 — 

(2aood 4* aoi 6 4- aioc)2 (2) 

W\{t) = — (aiod 4 <*116 4 2a20c)<— 

( 2a<x\d 4* aoi 6 4 <Moc) 

This symbolic parameterization for F 2 is independent of the 
specific values for 6 , c and d , i.e., it is always exact, since only 
rational operations in the coefficients of Fz are used. 

Since F(b, c, d) = 0, Fi{X u Y i 9 Wi) = F 1 (X ll Y u W i ) 9 and 
hence the parameterization (2) also applies to F\. Applying 
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compute f(x(t) y y(t)) directly: 


the inverse linear transformation to this parameterization im¬ 
mediately yields a formula for the original conic: 

X(t) = b(a 2 ot 2 + aio< + aoo) 

Y(t) = —(aiod-f an 6 + a 2 oc)J 2 — 

(2aood H- aoi6)t + aooc (3) 

W(t) = awdt 2 — (an& + 2a2oc)f— 

(aood + aoi& -f aioc) 

• The transformation cancelling the W 2 term is always affine 
(i.e. d = 1); it is the translation taking the point (6, c, 1) to the 
affine origin (0,0,1). The parameterization formulas derived 
are 

X(t) = -(aiod + a 2 oc + an&)i 2 - 

(aoid + 2aaib)t + ao 2 C 

y(f) = awbt 2 — (aiod + 2a2oc)t 4- (aoid-f (4) 
anc + aoafc) 

W{i) = d(a20^ 2 + ant + ara) 

Backward Error Analysis: Conics 

The only computation in the algorithm given above is to derive 
the coordinates of a point on the input conic curve. Once these 
coordinates are found, the parameterization is given as a closed 
form formula in terms of those numbers and the coefficients of 
the input curve. The output parameter functions x(i) and y(t) 
are formulas in algebraic numbers b and c (d is always either 
0or 1), satisfying f(x(t) y y(t)) = 0. When approximations b 
and c are used for b and c, the algorithm will output approximate 
parameter functions x(t) and£(f) such that /(£($), y(t)) ^ 0. 
These parameter functions also correspond to some algebraic 
curve. We would like to find the implicit equation of this new 
curve and compare it to the original input curve. This is the 
approach of backward error analysis. 

LEMMA. Let the first transformation above be used in com¬ 
puting the parameterization. Then the output parametric curve 
exactly satisfies the perturbed implicit equation f(x , y) = 
a 2 oy 2 + an xy + (ace — 8)x 2 + aio y + aoi x 4- a<x> = 0, where 
the value 6 is given by 

/(M) 

-2 p - 

a2o c + anc *+■ aoa 

PROOF. The analysis begins by computing the value of the 
expression f(x(t) y j?(t)). This value must vanish when exact 
arithmetic is used, since every point on the output (paramet¬ 
ric) curve must be on the input (implicit) curve. However, 
in the presence of numerical approximations, it will be non¬ 
zero, and can be found symbolically. It depends on which 
transformations above was used. In the following, we use the 
relationship f(X/W,Y/W) = F(X y Y y W)/W 2 . We now 



if d = 1 
if d = 0 


= F(6J?i(t),cXi(t) + <!&(«) + m(t)) 

W 2 (t) 

_ f (t, c, d)Xftt) ± FtjXtjt), flit), WkCO) 

W^(t) 

= Fj^dJXljt) 

W*(t) 

_ .F(6,M) X\t) 

P W*(t) 

= ! 2 (t) 


The key is that Fz(Xi{t) y Vi(f), Wi(<)) = 0 even when ap¬ 
proximations are used. 

Thus each point on the output curve evidently satisfies the 
equation f(x , y) — (F(6, c, d)/b 2 )x 2 = 0. Since F(b , c, 1) = 
/(S,c) andF(l,c,0) = a 2 oc 2 +an c+ao 2 , the lemma follows. 
□ 

Similarly, one can show that if the second transformation is 
used, the approximate output parameterization satisfies the 
equation /(x, y) — 8y 2 = 0 with 8 = /(6, c)/c 2 , for d = 1 , 
and 8 = 020 + an b + aoa? 2 for d = 0. 

Finally, if the third transformation was used, then the output 
parametric curve satisfies the implicit equation/(x,y) — 8 = 0, 
where 8 = /(6, c). 

Thus the effect of approximating (6, c) by rationals is an output 
parametric curve that corresponds to the input implicit curve, 
perturbed in precisely one of the coefficients ao 2 , fl 20 , aoo, de¬ 
pending on the transformation used. 

We note that the above discussion remains valid under scaling 
of the input equation by a constant. 

Geometric Error Bounds: Conics 


The algebraic error analysis tells us the implicit equation of 
the approximate output curve; it is natural to investigate the re¬ 
lationship between the input (exact) and output (approximate) 
curves. For conics and quadrics, we can derive geometric error 
bounds in terms of the magnitude of the coefficient perturba¬ 
tion. 

In [10], general bounds are given for local geometric perturba¬ 
tions at a point on a curve due to random perturbations in the 
coefficients of its equation. However, the perturbations that 
appear as a result of approximations in the parameterization 
process have a definite structure, which we exploit to derive 
global geometric error bounds. 

We investigate the geometric effects of perturbing a single 
coefficient in the equation of a conic curve. The perturbations 
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Figure 1: Conics Perturbed in the Higher Order Coefficients 


yield an entire family of conics. In particular, the effect of 
perturbing the constant coefficient is investigated. 

It will be shown that perturbing the constant coefficient gives 
rise to a conic similar to the original conic. We then bound 
the maximum orthogonal distance between the original and 
perturbed conic. 

We first list some relevant facts about conics, from [19] and 
[16]. Consider the affine quadratic equation of a conic curve 
C, in the form 

F(x> y) = ax 2 + by 2 4- 2hxy + 2 gx + 2 fy + c = 0 


The discriminant of C is 


A = 


a h g 

h b f 

9 f c 


= abc + 2fgh — af 2 — bg 2 — ch 1 


The following facts about conics are known: 

1. C degenerates to a pair of lines when A = 0 

2. C is a parabola when ab—h 2 = 0, an ellipse when a 6— h 2 < 
0, and a hyperbola when ab — h 2 > 0. 

3. When C is not a parabola, its center is given by 

(!±1=.bg gh-af \ 
v ab-h* » ab—h* /• 

4. The axes of the conic are given by the equation h(x 2 — y 2 ) — 
(a — b)xy = 0. 

5. The conic can be translated to have its center at the origin, 
and axes rotated to the principal axes. In this coordinate system 
its equation is 


F(x, y) = (a + 6 + R)x 2 + (a + b - = 0 


where R 2 = (a — b) 2 + Ah 2 . It is clear that perturbing the 
constant term c in the equation of a conic will produce a new 
conic of the same type that is concentric and coaxial with the 
original (see also [7]). Perturbing the coefficients of x 2 or y 2 , 
on the other hand, can change all these quantities: Figure 1 
shows a family of conics perturbed only in the coefficient of 
x 2 \ they vary in type, center, and axis. We will therefore only 


consider the third transformation of the conic algorithm, which 
only perturbs the constant coefficient. 

Even when only the constant coefficient is perturbed, the conic 
could still degenerate into a pair of lines. A large enough 
perturbation could turn a hyperbola into one that is concentric 
and coaxial to the original, but with transverse and conjugate 
axes reversed. Hence, an upper bound must be imposed on the 
perturbation. Since the constant coefficient c appears linearly 
in the discriminant A, so will the perturbed coefficient c 4- 
and hence one can immediately bound |6| to avoid this case. 
If this bound is very small the conic will already be close to 
degenerate. 

For perturbations smaller than this bound, then, we wish to ge¬ 
ometrically describe the error. Define the (orthogonal) distance 
from a point p on one conic to the other conic as the shortest 
distance along the normal vector at p to the other conic. Then 
the maximum orthogonal distance from a point on one conic to 
the other will occur at one of the extreme points of the conic 
along its semi-axes, if ellipse, or transverse axis, if hyperbola. 
Now suppose one is given two conics C, C, where the second 
conic is derived by perturbing the constant coefficient in the 
equation of the first (if C is a parabola, some slight modifi¬ 
cations will apply to the arguments below). Then they will 
be concentric and coaxial, and we can consider their equa¬ 
tions in a coordinate system where their center is at the origin 
and their axes are aligned with the primary axes. In this co¬ 
ordinate system their equations will take the form f(x,y) = 
Ax 2 4- By 2 + C\ = 0 and /(x, y) = Ax 2 + By 2 + Cz = 0. 
Let d xy i x be the distances along the r-axis from the origin 
(which is the center) to C, C respectively. Likewise, let d y , d y 
be the distances along the y-axis. That is, d x an d y are simply 
the lengths of the semi-axes of the conic (in the case of a hyper¬ 
bola, only one of these distances is finite). Then d x = d x +Px 
and d y = d y + p y . One of p x and p y will be the maximum 
orthogonal distance between the two curves. We can solve 
directly for p x and p yt the maximum and minimum geometric 
error. 

To solve for p x , put y = 0 in the curve equations. Then 
d\ = and d? x = Hence <P X - d 2 = (d x 4 * p x ) 2 - 
d 2 = . So p 2 + 2 d x px = and since p x — 0 

whenCi-C 2 = 0, we find that p x = -d x + \Jd | 

Revert to the original coordinate system, where the conics have 
equations ax 2 + by 2 -h 2 hxy + 2 gx + 2 fy + c\ = 0, ax 2 -f 
by 2 4- 2 hxy + 2 gx 4- 2fy 4- C 2 = 0; then, by the coordinate 
transformations of the previous section, and some algebra, C\ — 
Cz = 2(ci — cz). Using the definitions for A and B t and R as 
given in previously, and putting 6 = c\ — C 2 , we find that 



Now suppose it is desired that \p x \ < c for some e > 0, and 
we wish to bound \6\. To have | < c, it is necessary that 
p x < e and p x > —c. For reasons that will soon be clear, we 
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will require e < d x . Considering e ach case separatel y, 

1. Px < e implies that (-<f* + + (s+I+r) S ) < e - 

2 . p x > -f implies + yfiH + (s+£Fr) > -e. 

After considering both possibilities for the sign of a + b + 72, 
the requirements above may be satisfied by taking 

|*| < c(2d x + c)|(a + 6 + i?)/2| 

|*| < c(2d x -c)\(a + b + R)/2\ 


for cases (1) and (2) respectively. 

Finally, recalling that e < d x , choices (1) and (2) can be 
simultaneously satisfied by choosing 


\6\ < c • d x ■ 


a ■+■ b + 72 
2 


This is the only simplification made in the calculation, and at 
most a factor of two of accuracy (one bit) is lost 
The error along the y axis is bounded in an identical way. The 
quantities d x and d y are independent of any scaling of the 
coefficients of the original conic by a constant but the scale 
factor will be linearly present in the quantities a b + R and 
a + b — R. Hence, if 6 is defined as in the backward error 
analysis for conics, these bounds correct for the scale factor 
automatically. Keeping this in mind, it suffices to compute 6 
such that 


i*i <« 


min(d x • \a H- b H- 72|, d y • ]e -j- b — 72|) 
2 


Quadrics 

The results for conics generalize directly to quadrics. It is 
possible to derive explicit formulas of degree two for the 
parameterization, in a pair of parameters s,t. The formu¬ 
las are small; the only computation required is that of find¬ 
ing a point (a, b } c, d) on the homogeneous conic. There are 
four choices of transformations, one to cancel each squared 
term. A corresponding error analysis holds. For instance, 
if the W 2 term is cancelled using an approximate (affine) 
point (a,6,c), then the output parameterization will satisfy 
f(x(s t t) y y(s } *),z(M)) - /(«,£, c) = 0, i.e., the original 
input equation perturbed in the constant coefficient. 

This raises an important point. In general, a parametric curve of 
degree n corresponds to a curve of algebraic (implicit) degree 
71 , but a parametric surface of degree n may correspond to 
a surface of algebraic degree up to n 2 . Thus when using 
approximations in a parameterization, one might legitimately 
question whether the algebraic degree of the output is the same 
as that of the input. In the case of quadrics (n = 2), it would 
be unpleasant if a parameterization algorithm could actually 
output a cubic or quartic surface. Fortunately, the error analysis 
above allows us to answer this question in the negative, since a 
set of rational parametric equations of a surface satisfy a unique 
irreducible algebraic surface. 



Figure 2: Quadrics Perturbed in the Constant Coefficient 

Geometric Error Bounds: Quadrics 

As for conics, there is a quadric discriminant. The sign of the 
discriminant, among other quantities, distinguishes amongst 
the various quadric surfaces. Essentially, perturbing the con¬ 
stant coefficient preserves the center and orientation, although 
the quadric could degenerate from a hyperboloid of one sheet 
to a cone to a double-sheeted hyperboloid (see Figure 2). Per¬ 
turbing the highest order coefficients could cause an ellipsoid 
to change to a cylinder to a one-sheeted hyperboloid, for ex¬ 
ample, in addition to changing its orientation and center (Fig¬ 
ure 3). Since the geometric errors find their extrema along the 
axes when the center and orientation are fixed, we can bound 
the errors easily in this case. We simply state the results, for 
brevity. Vital information regarding quadrics was taken from 
[ 20 ]. 

Let two quadrics that differ only in their constant coefficient 
be given. Generalizing the notation from the conic case, let 
d X} d yy d z be the distances from the origin to the unperturbed 
conic (some may not be finite). Given a number e > 0 that 
also satisfies c < min(dx, d yi d z ) t and a difference in the con¬ 
stant coefficients of a quantity 6 , if the geometric perturbations 
Pz>Py,Pz are to satisfy 

max(|Px|. \Py\, |p*|) < e 

then it suffices to choose 6 such that 

|*| < e * min(dx • |Ai| y d v • |A 2 | ,d z • |A 3 |) 

where expressions for A. are the roots of a cubic polynomial 
0(A) whose coefficients are expressions in the coefficients of 
the quadrics. From data in [20], the quadric can be put in 
standard form in terms of the roots of 0(A), allowing the the 
quantities d Xy d Vi d z to be efficiently calculated. We omit the 
details here. 

Only considering real values of d Xf d y ,d z , then, we can bound 
the geometric error for a quadric due to approximate parame¬ 
terization. 
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Figure 3: Quadrics Perturbed in the Higher Order Coefficients 
Singular Cubic Curves 

For an irreducible, singular cubic plane curve, a parameter¬ 
ization algorithm is given in [2], which we analyze. While 
output formulas exist for this case, they are unwieldy, and in¬ 
stead we show how they can be derived, and the error in the 
parameterization. 

INPUT. A cubic plane curve given by the cubic equation 
f(x,y) = a 3 oy 3 + a2izv 2 + a2oy 2 + ai2X 2 J/ + an£V + aioy + 
a<B£ 3 + ao 2 X 2 -f aoi£ + aoo = 0. 

OUTPUT Rational functions (x(f), 0 (f)) of degree at most 
four, such that /(z(f), y(t)) = 0 . 

ALGORITHM. As in the conic case, the curve is transformed 
into a birationally equivalent one that is readily parameteriz- 
able. Several transformations are used. The steps are detailed 
below. If the cubic has a zero z 3 or y 3 term, the first step is 
omitted, otherwise the first step cancels y 3 . The computation 
is symmetric with respect to x. 

1. Apply a transformation that removes the y 3 term of /. This 
can be done via the linear transformation x = x \ -f qy \, y = y i, 
When applied to the cubic equation /(z, y) = 0, this yields 
a new cubic curve with equation /i(zi, y\) — 0 = f(x\ + 
QVu 0i) = L(q)yi+fi(xi,yi) Where L(q) = ao 30 3 + ai 2 0 2 + 
auq + U 30 - Choose q to be a root of L , i.e. L(q) — 0. Then 
the subexpression L(q)y\ must vanish, so we only need to 
parameterize the curve h(x\,y{) — 0 

2. Parameterize the cubic with equation / 2 (zi, yi) = 0, which 
has no y\ term, by transforming it into a transformed into a 
quadratic curve. / 2 is of the form 

M x u 2 /i) = 0 i(*i) 0 ? H- gi(x\)y\ + ^ 3 (^ 1 ) (5) 

where 01 , 02,03 have degrees equal to their subscripts. The 
discriminant of / 2 (with respect to 01 ) is simply 04 (zi) = 
02 (xi) 2 — 40 j(zi) 03 (zi). It can be shown that 04 (^ 1 ) must have 
a multiple root of the original cubic is singular, as assumed. 
By performing the following substitution 

2/2 = 20i0i4-0 2 (6) 


we have 

4 gi h = 4 g\y\ + 4 gig 2 yi + 4 jj g 3 

= ( 2 giyi + gif - (g 2 - 4 <ns 3 ) ( 7 ) 

— - 
= 02 “ <74 

Note that 04(^1) is a polynomial in zi of degree at most four. 
The curve is singular (and hence rational) if and only if 04 (zi) 
has a multiple root. This repeated root can be real or com¬ 
plex; only the real case is considered. Now for any num¬ 
ber r, expand the polynomial 04(11) in a Taylor series at r: 

<74(11) = —(zi — r)\ The terms of order higher 

than 4 are identically zero, 04 being a polynomial of degree 4 . 
Collecting coefficients of (zi — r) 2 yields 

<M(x 1) = ?2(zi)(zi - rf + g\{r){xi - r) + <7 4 (r) (8) 

where 02(^1) is of degree two. Now apply the substitution 
03 = 0 2 /(z 1 — r) together with (8) into the right-hand side of 
( 5 ); this leads to 

4 <71/1 = yl - <? 4 (xi) 

= ( y\ ~ 92(xi))(xi - rf + g'n(r)(xi - r) + g A (r) ( 9 ) 
= h{*\,yi) 

Chooser to be a multiple root of 04(zi): then04(r) = 0i(r) = 
0, and the subexpression 0j(r)(zi — r) + 04(r) must vanish. 
Therefore, to parameterize /3(^i, 03) we can simply param¬ 
eterize the conic curve corresponding to the quadratic factor 
C(xi, yi) = y\- qi(xi) = 0 . 

3 . Parameterize the conic with equation C(x\, 03) = 0 using 
the methods of the previous section. This yields a pair of 
rational functions (21(f), 03(f)) that satisfy C(zi(f), 03(f)) = 
0 . Applying all the transformations in reverse yields one for 
the input cubic. 

The cubic parameterization calls for computing a root q of 
the cubic polynomial L(q) t a multiple root r of the quartic 
polynomial 04 (zi), and a parameterization (si(f), 03(f)) of the 
conic with equation C(zi, 03) = 0 . Assuming, say, that the 
third conic transformation section was used, a pair of algebraic 
numbers ( 6 , c) need to be computed. 

Backward Error Analysis: Singular Cubics 

If all computations were exact, i.e. L(q) = 0 , 0 4 (r) = 0 , and 
^(*i(f)i 03 (f)) = 0, then the output will be correct. However, 
one may need to use approximations q , f and ( 5 , c), which will 
lead to an approximate output parameterization (z(f), y{t)). 
In this case one must measure the error incurred. Once again, 
a backward error analysis will be performed, beginning with 
back-substitution. 

LEMMA. The output parameterization will satisfy the implicit 
equation 

/(*.») -L(q)y 3 - 

<?(£, c)(z - qy - ff + gj(f)(g - qy - r) + g4 (f) _ 
4ffl(x - qy) ~ 
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PROOF. Given the approximate output parameter functions 
(x(t), y(t )), we compute /(x(f), £(<)). The subscript (t) is 
dropped for convenience. Then /(x, y) = f(xi 4- gyi, yi) = 
i(9)^l + /2(XI, J/l). 

Applying transformations in reverse and performing some al¬ 
gebraic manipulation, we find that 


/=(«!,.,) - ««., 

= (y3 2 ~ g2(xi))(x"i - fy + gj{r)(xi - r) + g4(r) 

4gi(x*i) 


Now C(xi(*), y 3 (t)) = y 2 — ? 2 (x"i), and since we assumed 
that the third conic transformation was used to parameterize C, 
it follows that there is a point (?, c) such that C^xi^), 1/3 (t)) = 
C(6, c). The lemma follows by substituting and expanding 
previous identities. □ 

If the values q, f , 6, c are exact, then 1(g) = g 4 (r) = g|(r) = 
C(6, c) = 0, and it is clear that the parametric output curve 
coincides with the implicit input curve. 

However, if the values are not exact, the output curve differs 
from the input curve. The coefficient perturbations are now 
present in many terms, not just one. 


Exact Solutions: Singular Cubics 


Finally, we show that in some cases, algebraic number com¬ 
putation is unnecessary for exact rational parameterization. A 
fact that appears to be known in Diophantine analysis is that 
a rational cubic curve with rational coefficients has a rational 
singular point 1 This was apparently not well-known in the 
geometric modeling community; it is mentioned in a book on 
Diophantine equations ([14]). 

Every rational cubic has a singular point. It is well-known (see, 
e.g. [21] for details) that such a cubic can be parameterized by 
a pencil of lines through the singularity, which then intersect 
the cubic at exactly one other point. The coordinates of the 
latter point parameterized by the slope of the line give parame¬ 
ter functions for the cubic curve. The parameter functions are 
given as closed form formulas in the parameter t , the coeffi¬ 
cients of the curve, and the coordinates (6, c) of the singularity, 
as shown below: 

X(t) — a^bi? — (3a3oc 4- a2o)< 2 — 

( 2 a 2 ic ai 2 b -f- an)f — (2aa3& + al2c + a®) 

Y(t) = — ((2a3oc 4* 0216 -f a2o)* 3 + 

(a2ic -f 2ai2& 4- an)f 2 4- (3ao3^ 4- aa i)t — acac) 

W(t) = ci3ot 3 4* fl2if 2 4- &l2t 4~ U 03 

Therefore, if extended precision rational arithmetic is allowed, 
one can theoretically parameterize an irreducible rational cubic 
curve without error and without algebraic number computation, 
by computing the singular point exactly, and substituting the 
coordinates in the above formula. One way to compute the 
singularity rationally is as follows. 

1 We are grateful to Allan Adler for alerting us to this fact. 



Figure 4: Exact and Perturbed Singular Cubics 


An affine singular point is found as a solution to the system 
of equations f(x,y) = fx(x,y) = f y (x,y) = 0. The x- 
coordinate of this solution will be a multiple rational root of the 
degree six polynomial p(x) = resultant(f(x , y) } / X (x, y ), y). 
The rational roots of a polynomial can be computed by apply¬ 
ing the algorithm in ([13]). The resultant is computed using a 
subresultant remainder sequence; this may then be used to com¬ 
pute the y-coordinate ([4]). Each (x,y) pair found this way 
can be tested whether it additionally satisfies f x (x , y) = 0; 
only one pair will satisfy the test. 

Extensions 

We have reformulated algorithms for rational parameteriza- 
tions in a finite precision domain. Algebraic numbers are 
approximated by rationals to produce an approximate parame¬ 
terization of an implicit curve or surface. For each method, we 
isolated the eiror due to the algebraic number approximation. 
The error formulas have useful geometric interpretations, some 
examples of which were given. In ongoing research, we find 
that the parameterization algorithms are quite stable. For in¬ 
stance, monoid parameterizations depend on the computation 
of the singular point (6, c) of a monoid curve. If a rational 
approximation (6, c) is calculated instead, we show that the 
monoid algorithm can be formulated so that the approximate 
output parametric curve will have a singularity at (6, c). In 
general, if a point is computed for a parameterization (conic, 
quadric, monoidal curve or surface), we show that the approx¬ 
imate point and the approximate output mimic the relationship 
of the exact point and the (exact) input (see Figure 4). 

The method of rederiving the algorithms to work properly in 
finite precision arithmetic works well for parameterization. In 
fact, we have been able to generate a formula to parameterize a 
cubic surface, in terms of its coefficients and certain algebraic 
numbers derived from them [15]. The rational parametric equa¬ 
tions derived are of the fourth degree, comparable to [2], [18]. 
Our algorithms will always succeed in producing output of the 
right degree when approximations are used. A straightforward 
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MACSYMA implementations of [2] fails in this case. This 
is because such algorithms require rational simplification of 
multivariate expressions, whereas simple divisibility proper¬ 
ties are destroyed by small numerical perturbations, leading to 
intermediate and output expression swell. 

While the general formula is very large, it is easily implemented 
in common programming languages and we expect it to be 
highly efficient. Furthermore, by specializing certain quantities 
in the formula, we hope to derive parameterization formulas 
for various families of cubic surfaces, and build a menagerie 
of controllable dual form surfaces. 

Finally, we have only considered these problems in a rational 
arithmetic setting (extended and finite precision). However, 
having reduced the parameterization algorithms to simple for¬ 
mulas involving only additions and multiplications, we would 
like to experiment with them in the floating point realm, also. 
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Abstract 

Current systems for the automatic generation of information 
presentation or the automatic illustration of objects are mainly 
generation-oriented, i.e. they generate solutions to a user who 
in turn has only few possibilities to lead a dialogue with the 
generation system. By contrast, systems which emphasize dia¬ 
logues tend to be weak in graphical interaction. 

In order to build systems which are both generation- and dia¬ 
logue-oriented, we present a new software architecture for 
computer graphics applications. We develop the concept of hy¬ 
per-rendering which produces a formal description of the 
scene as viewed by the user and can either be carried out within 
a Tenderer or in a separate program. The output of the hyper- 
rendering program is an image description which consists of 
information which conventional Tenderers usually compute but 
typically throw away. We describe two broad categories of ap¬ 
plications of hyper-rendering and a prototypical implementa¬ 
tion. Examples of the pictures produced as well as a sample 
session with an application are included. 

Keywords : rendering, image description, photorealistic imag¬ 
es, virtual reality, multi-media, interactive systems. 

1. Introduction 

Today’s graphics systems for producing photorealistic images 
consist of two main software components: a modeler with 
which a user can define a scene (model objects to be rendered, 
choose the camera position and perspective, etc.) and which 
produces as output a scene description; and a Tenderer, which 
accepts as input a scene description and produces as output the 
image. It is a fundamental characteristic of such graphics sys¬ 
tems that the output of the Tenderer is designed exclusively for 
viewing by a (human) user. While the machine produces the 
image based on a scene description, no information is stored in 
general as to what a user can actually see. This lack of informa¬ 
tion makes it impossible to establish a link between the Tender¬ 
er and an application program, although a main purpose of 


producing photorealistic images is communication. 

Indeed, rendering algorithms “waste” a great deal of informa¬ 
tion which could be gathered and represented explicitly. For 
example, hidden surface removal algorithms-as their name 
implies-throw away information about surfaces which are not 
visible to a user: But if a user wants to know which objects of 
the scene she or he cannot see? Such information pertaining to 
objective and subjective perception of the picture by the user 
and its relationship to the modeled scene is not made available 
systematically. 

In this paper we propose that the usability of graphics systems 
can be enhanced greatly by making information about the ren¬ 
dering process explicitly available to other programs. We refer 
to this process of deriving such information as hyper-rende¬ 
ring . Hyper-rendering produces a formal description of the 
scene as viewed by the user and can either be carried out within 
a Tenderer or, as in our prototypical implementation, in a sepa¬ 
rate program. The latter approach has the advantage that sepa¬ 
rate algorithms can be used for rendering and hyper-rendering 
of the same image, especially when hardware-rendering is 
used. Furthermore, when an image is to be produced with ray¬ 
tracing it may suffice to carry out the hyper-rendering with a 
fast z-buffer algorithm. Indeed, in certain applications it may 
even suffice initially just to cany out hyper-rendering and only 
later to perform the more time-consuming rendering. 

This paper is organized as follows: Previous research is sur¬ 
veyed in Chapter 2. In Chapter 3, we present a new software 
architecture for graphics systems with integrated hyper-rende¬ 
ring. Next, we point out two broad categories of applications 
of hyper-rendering and discuss the benefits to end-users. In 
Chapter 4, we survey the implementation techniques of our 
prototypical hyper-renderer. Chapter 5 gives an example of the 
capabilities of an application which uses this hyper-renderer 
and shows a sample dialogue session. Further work is dis¬ 
cussed in Chapter 6. 
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2. Background 

Our work on hyper-rendering is intended as a bridge between 
the generation-oriented systems which present "canonical” so¬ 
lutions to a mainly passive user, and the dialogue-oriented sys¬ 
tems which rely on an active user and which are able to answer 
questions. A representative of the first category is APT (A Pre¬ 
sentation Tool, [Mackinlay, 1986]). The system generates pre¬ 
sentations of two dimensional relational infonnation, where 
the two main criteria for the generation are expressiveness and 
effectiveness . Expressiveness criteria determine whether a 
graphical language can express the desired information, and ef¬ 
fectiveness criteria determine whether a graphical language ex¬ 
ploits the capabilities of the output medium and the human 
visual system. Thus, users get a large variety of "canonical” 
presentations fulfilling these two criteria. However, the tech¬ 
niques developed by Mackinlay are very difficult to extend to 
user-interaction, for example, for cases in which users wish to 
modify the presentation. The infonnation about such a modifi¬ 
cation of the layout is difficult to report back to the generation 
system, which is in addition not able to judge the quality of the 
modification. 

The WIP system (Knowledge-based Presentation of Informa¬ 
tion, [Wahlster et al., 1991]) generates a variety of multimodal 
documents from an input consisting of a formal description of 
the communicative intent of a planned presentation. The focal 
point of WIP is the generation of illustrated texts that are cus¬ 
tomized for the intended audience and situation, but interaction 
with the illustrations is not yet possible. 

A representative of generation-oriented dialogue systems is the 
IBIS system of [Seligmann and Feiner, 1991] (see also [Feiner, 
1985]). It is intended for the automatic generation of intent- 
based illustrations and shares several research interests with 
the WIP system, but differs in the system’s architecture, for ex¬ 
ample. An illustration is a picture that is designed to fulfill a 
communicative intent such as showing the location of an object 
or showing how an object is manipulated. The design of an il¬ 
lustration is treated as a goal-driven process within a system of 
constraints. The system uses a generate-and-test approach 
which relies on a rule-based system of methods and evaluators. 
Methods are rules that specify how to accomplish a visual ef¬ 
fect, while evaluators are rules that specify how to determine 
how well a visual effect is attained in an illustration. As users 
are able to manipulate illustrations interactively, IBIS main¬ 
tains the methods of visibility and recognizability during an in¬ 
teractive session. The current implementation supports user- 
controlled view specification, that is, the user can zoom objects 
or specify a new camera location. However, there is no possi¬ 
bility for a user to ask anything, for example about the interior 
of a particular object or to find out through a dialogue where 
shadows are located. However, the visibility methods of the 
IBIS system do provide very simple information for further 
processing through application modules. For example, the 
evaluator which calculates whether an object which must be 
visible is partially obscured returns a binary value. 


More flexibility is desirable for human-computer dialogues 
concerning the graphics, particularly in user interfaces using 
graphics for teaching purposes. Flexibility here refers not only 
to the viewing specification but also to supplementary infor¬ 
mation about the objects. To this end, [Strothotte, 1989] devel¬ 
oped a prototypical chemistry explanation system which 
generates pictorial explanations automatically and is capable 
of leading a dialogue with the user. For example, as an answer 
to the question "How is N 2 produced?”, several pictures show¬ 
ing the steps of the chemical production are presented. In this 
system, the user can manipulate the labels on the diagrams to 
obtain more infonnation. However, the flexibility of the dia¬ 
logues is attained at the expense of the quality of the graphics: 
Strothotte’s system relies on handmade bitmapped images, 
that is, users are not able to modify the pictures presented. 

Besides the research on the generation of information presen¬ 
tation, much work is carried out on algorithmic methods for 
computing the images to be presented. Improving the perfor¬ 
mance of image rendering can be done by using visibility pre- 
computations, since by performing work off-line they reduce 
the effort involved in solving the hidden-surface problem. In 
particular, many spatial subdivision techniques have been pro¬ 
posed for speeding up ray tracing ([Weghorst et al., 1984], 
[Glassner, 1989]) as well as for preparing interactive walk¬ 
throughs through complex environments, for example ([Teller 
et al., 1991]). Another method to improve the performance of 
image rendering is to convey most of the information to the 
user as early as possible, with image quality constantly im¬ 
proving with time, that is, to render images by adaptive refine¬ 
ment ([Bergman et al., 1986]). 

3. Working with a Hyper-Renderer 
3.1 The Process of Hyper-Rendering 

A software architecture for graphics systems using hyper-ren¬ 
dering is illustrated in Figure 1. The non-shaded boxes with a 
scene description, a Tenderer and the resulting picture are as 
used in conventional graphics systems. Conceptually, hyper- 
rendering software is built around the Tenderer. It computes 
various pieces of information about both the rendering process 
and the rendered picture, which conventional renderers either 
throw away or don’t bother computing in the first place. This 
information is stored in an "image description file”. 

The primary purpose of hyper-rendering is to allow users to do 
more with the rendered pictures than just look at them; this is 
facilitated by various kinds of applications which take as input 
the image description. Depending on the application, the scene 
description may also be used or modified. An application thus 
has information about what the user can see in the picture and 
is able to handle the dialogue with the user about the picture, 
evaluate the picture under certain criteria or even modify it so 
as to change the visibility of the objects. 
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Figure 1: A new software architecture for graphics systems with 
integrated hyper-rendering. The hyper-rendering software is 
built around the tenderer and computes various pieces of infor¬ 
mation about both the rendering process and the rendered pic¬ 
ture, which conventional Tenderers do not provide for further 
processing. 


3.2 Applications 

We consider there to be two broad categories of applications of 
hyper-rendering. The first category pertains to applications 
with direct end-user involvement. If a user is to engage in a dia¬ 
logue about the contents of the graphics with the machine, it is 
an absolute prerequisite that the application has at its disposal 
detailed information about what the user can see. This is par¬ 
ticularly important in virtual reality or cyberspace applications, 
when the user will ask for information and explanations about 
certain objects in the scene. It should be particularly clear here 
that the scene description itself is not sufficient for the machine 
for the purposes of leading such a dialogue, since a user's input 
must be interpreted with respect to what he can see (or in some 
cases more importantly what he cannot see) rather than only 
with respect to the model of the scene. The hyper-renderer can 
also be used in teaching material, when a pupil will ask for in¬ 
formation and explanations about certain parts of the graphics. 

The second broad application of hyper-rendering is in situa¬ 
tions in which a graphic shown to the user (or intended for dis¬ 
play) must be evaluated with respect to its appropriateness by 
the machine. Linking an application program to a Tenderer 
without modifying the latter is not possible without storing the 


information the renderer calculates. With our architecture, it is 
in particular possible to build knowledge based systems which 
analyse the modeling of a scene with respect to given con¬ 
straints, generate improvements, and prepare explanations for 
the user. For example, [Fischer et al., 1990] discusses the cri¬ 
tiquing approach to building knowledge based interactive sys¬ 
tems and describes a critiquing system for the 2D design of 
kitchens. A problem in 3D modeling, in particular when deal¬ 
ing with a virtual environment, is to avoid penetrating rigid ob¬ 
jects or having them collide. As standard methods for 
computing collisions have high computational complexity, 
[Pentland, 1990] describes more efficient methods for the cal¬ 
culation of dynamics, collision detection, and constraint satis¬ 
faction. Another deficiency of scene modeling is when parts of 
the scene where the observer should focus on lie in shadow. In 
this case, an improvement generator can suggest better loca¬ 
tions and parameters for the light sources. 

It is important to note that hyper-rendering is independent of 
the modeling software employed and can be used to improve 
the modeling process indirectly in contrast to other approach¬ 
es, where new modelers are developed (for example [Hall, 
1991]). This is particularly important, as many deficiencies in 
scene modeling can only be determined with significant effort 
by modelers. For example, only a renderer knows where shad¬ 
ows are located and how they influence the perception of the 
image. 

The hyper-renderer can, however, hardly be used to design 
good (or appropriate) presentations automatically. It can be 
used in the evaluation of a particular graphic with respect to 
communicative goals. If the evaluation is negative, other 
means must be found to correct the situation. 

In both these categories of applications, the facilities of a hy¬ 
per-renderer provide vital input to programs using photorealis¬ 
tic graphics as a communicative tool. The success of such 
applications depends on the sophistication with which the hy¬ 
per-renderer works to obtain information on the graphics and 
its perceptions by the user. 


4. A Prototypical Hyper-Renderer 

Rather than to integrate a hyper-rendering facility into an ex¬ 
isting renderer, we chose to work with a commercially avail¬ 
able renderer (in the present case Pixar’s RenderMan) and 
build a hyper-rendering facility as an extra program. While this 
has the disadvantage that certain code must be duplicated, it al¬ 
lows us in fact to use different algorithms for rendering and hy¬ 
per-rendering. 

Our hyper-renderer is written in about 5k lines of C code on an 
IRIS 4D35. Input is the scene description file in the format 
used by the RenderMan. To support maintainability of the 
scene description file and to facilitate dialogues about the 
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graphics produced by the Tenderer, we extended its format to 
include symbolic names of the objects as well as grouping of 
objects into compound objects. 

The hyper-renderer contains typical rendering algorithms 
which have been enhanced to record information symbolically 
about the graphics. In particular, an extended z-buffer-algo- 
rithm was implemented for hidden-surface removal; however; 
as opposed to conventional z-bufFers, information about hid¬ 
den surfaces is stored, not thrown away. Our implementation is 
related to Atherton’s implementation of an object-buffer 
([Atherton, 1981]). However, while Atherton’s three-dimen¬ 
sional display buffer was implemented in the form of a solid 
object description, we approximated quadric surfaces through 
polygons. Although the basic algorithmic method used is not 
new, the output produced by the hyper-renderer, namely the 
image description file, is a significant enhancement. Finally, we 
used our z-buffer to generate shadows which are caused by 
opaque objects, as proposed by Atherton as well (see also [Ap¬ 
pel, 1967] and [Bouknight et al., 1970]). 

It is important to note that the resolution of the hyper-rendering 
algorithms need not be the same as the resolution of the actual 
rendered picture; the resolution is determined dynamically by 
the application program. A coarser hyper-renderer suffices for 
many applications and means that the results of hyper-rende¬ 
ring can be made available significantly before the rendered 
picture is in fact available. 

By default, our hyper-renderer carries out a fast z-buffer-algo- 
rithm in a first pass. By recording which objects are in the line 
of sight of each pixel, and with information about which sur¬ 
faces cause, for example, specular reflection or refraction, 
those parts of the scene which require ray tracing are deter¬ 
mined. This way, our ray tracer, which is currently being im¬ 
plemented, will be “selective”, that is, restricted to such parts 
of the picture as are deemed necessary. Furthermore, the visi¬ 
bility information supplied by the z-buffer algorithm can be 
used for hidden surface removal which makes the ray tracer it¬ 
self much more efficient (this concept is related to that of an 
“item-buffer” as described by [Weghorst et al., 1984]). The 
calling application program can, however, override these “fan¬ 
cy” features and force the use of particular hyper-rendering al¬ 
gorithms. 

The output of our hyper-renderer is an image description. It is 
implemented as a file organized in an object-oriented manner. 
The names of objects of the scene description file are associat¬ 
ed with various pieces of information, including low-level data 
as to pixels affected by the object and the colors as well as 
high-level information pertaining to the visibility (and invisi¬ 
bility) of objects, their interior, the intersection of objects, cer¬ 
tain prepositional attributes (in front of, behind of, on,.,.) as 
well as a description of which objects lie in shadows. Figure 2 
summarizes the data produced by our hyper-renderer. 

Note that our prototypical hyper-renderer is by no means com¬ 
plete in the sense that is conceivable that new applications will 


require more or different information. Indeed, the modular im¬ 
plementation of the hyper-renderer supports extensions by a 
systems programmer. In particular, the complex information 
alluded to in Chapter 3.2 needs further study before its extrac¬ 
tion can be implemented. 
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Figure 2: Information contained in the image description file . 
The hyper-renderer produces as output for each object and its 
parts a semantic net which contains information about visibil¬ 
ity and invisibility, shadows, the interior of the object, other 
objects intersecting it, prepositional information, information 
about the position of light sources relative to the object, its re¬ 
flection properties, and its material properties. 

5. An Example 

5.1 A Prototypical Application 

In order to demonstrate the capabilities of hyper-rendering, we 
developed a simple dialogue system for navigating in a scene. 
The dialogue is conducted in a restricted natural language 
which allows a user to specify the kind of graphics to be dis¬ 
played (wire-frame or full-suiface-rendering). The most sig¬ 
nificant feature of the application is that it allows the user to 
formulate a constraint on the view, upon which the application 
computes a change in the scene description file and initiates re- 
rendering and re-hyper-rendering of the graphics. 


5.2 The Dialogue and a Sample Session 

We will explore the interior of an IKEA cupboard. A descrip¬ 
tion of the cupboard and its contents was designed with a mod¬ 
eler and stored in a RenderMan “rib”-file. The end-user now 
uses the application to draw the picture by typing an appropri¬ 
ate command: 
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Application: Please enter command. 

User: Draw cupboard. 

The RenderMan full-surface renderer is invoked by the appli¬ 
cation and produces Figure 3. 

The user now wishes to see more of the object in the picture 
and decides to switch to a wire-frame image. 

Application: Please enter command. 

User: Draw wires. 

The RenderMan is invoked again and produces Figure 4. 

The user realizes that the wire-frame image does not provide 
enough information to find out what is in the cupboard. Hence 
he asks for information on what he cannot see: 

Application: Please enter command. 

User: What can I not see? 

Application: Ball, box, safe. 

In this latter response, the application has made use of the hy- 
per-renderer tool. The image description contains information 
about the visibility of objects and from this it is easy to com¬ 
pute which objects are not visible. 

The user now wishes to look at a particular object, the safe. By 
looking at the wire-frame image, he recognizes that there are 
two candidate objects which can be a safe (one is on the bottom 
and another is below the top on the left side). Hence, he enters 
the following command: 

Application: Please enter command. 

User: Show safe through glass. 

The application responds with Figure 5. 

The application used the information in the image description 
file to determine that it was the cupboard itself which blocked 
the user's view of the safe. It then changed the material of 
which the upper left part of the cupboard is made to “glass” in 
the scene description and drew the resulting picture. 


5.3 R£sum6 

The dialogue illustrated above is a simple example of the use 
of hyper-rendering. While it is in principle possible to extract 
such information as “What can I not see?” directly from the 
scene description with a significant amount of work, we be¬ 
lieve our hyper-rendering to be the first general purpose tool for 
interactive graphics which allows an application to determine 
this kind of information in a simple manner. Furthermore, im¬ 
plementing a command such as 41 Show safe through glass”, 
specifying a constraint on the visibility of a scene is easily ac¬ 
complished when a hyper-renderer is available but would be te¬ 
dious to program without such a tool. 


6< Concluding Remarks 

In this paper we introduced the concept of hyper-rendering 
which makes information about the rendered image available 
to an application. We argued that such information is useful for 
providing information about the graphics to end-users as well 
as to evaluate the graphics displayed. We demonstrated the im¬ 
portant facilities of the hyper-renderer with a sample dialogue 
application. 

The concepts we introduced open up a range of new problems 
for further study. One area is to use hyper-rendering to improve 
the rendering process itself. As the performance of the rende¬ 
ring process can be improved by refinement, that is, by subdi¬ 
viding it into phases where the results of each phase are carried 
over to the next phase ([Bergman et al., 1986]), hyper-rende¬ 
ring could be used to supply additional formalized information 
which in turn speeds up the performance of a succeeding 
phase. 

In addition to this, hyper-rendering provides a tool for devel¬ 
oping more effective multi-media dialogue systems. Since 
more information about the graphics is available as a result of 
hyper-rendering, speech or text interaction concerning or inte¬ 
grated with the graphics has a better chance of being appropri¬ 
ate. In particular, hyper-rendering can offer additional 
interaction facilities and information for users who are directly 
placed in a virtual environment or cyberspace. For example, 
users could ask questions about their location and the objects 
they are looking at Therefore, hyper-rendering could prevent 
users from being “lost in cyberspace”. Furthermore, if users 
have difficulties in interpreting the graphics output by the ma¬ 
chine, it is in turn possible to train them to see and to guide 
them where to look. For example, during an interactive presen¬ 
tation it is often difficult to look at the <4 right” objects dis¬ 
played, as humans are, in general, not trained in this capability. 
Using our hyper-renderer, it is possible to develop interactive 
multi-media systems which allow users to lead dialogues about 
the graphics with the goal of discovering what is important. 
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Figure 3 (top left): The rendered image of the IKEA cupboard. 

The user is unable to see many objects, such as those lying inside or behind the cupboard. 
Figure 4 (top right): The rendered wire-frame image of the cupboard. 

All objects are visible, though largely indiscernible. 

Figure 5 (bottom): The modified image showing the inside of the cupboard. 

The user previously recognized that there were two candidates for the safe in the wire-frame im¬ 
age. He thus entered the command “Show safe through glass”. Based on the information placed 
at the disposal of the application by the hyper-renderer, the application converted the material 
of which the upper left part of the cupboard is made to glass, thereby offering the user a view 
of the objects of interest to him. 
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Abstract 

Sound has the potential to improve our understanding of a 
program’s function, structure, and behavior. In this paper 
we identify classes of program information suitable for 
mapping to sound and suggest how to add auralization 
capabilities to programming environments. We describe 
LogoMedia, a sound-enhanced programming system which 
illustrates these concepts. 

Keywords: Program Auralization, Non-Speech Audio, 
Software Visualization, Programming Environments 

Program auralization 

Gerald Weinberg, author of The Psychology of Computer 
Programming, describes programming as “at best a com¬ 
munication between two alien species” (Weinberg, 1971). 
Indeed, despite efforts in the field of software visualization, 
programs are in desperate need of better means of presenta¬ 
tion and clarification. This paper explores the potential of 
non-speech audio to increase the communications band¬ 
width between the two “species.” 

Non-speech audio is quickly becoming an integral part 
of the computer’s ability to record and present data, as is 
evidenced by recent introductions of computer systems 
with built-in high quality sound input and output from 
graphics workstation manufacturers such as Sun 
Microsystems (Yager, 1991) and Silicon Graphics (Smith, 
1991). Interface designers have only just begun to tap into 
the capabilities of this new hardware. Although sound has 
seen limited use in some software visualization systems 
(Baecker, 1981; Brown, 1988), it is only recently that 
computer audio has been seriously applied to the program¬ 
ming domain to help elucidate the behavior of running 
programs (Jackson and Francioni, 1992; Sonnenwald, et 
al„ 1990). Program auralization refers to the use of non¬ 
speech audio for supporting the understanding and effective 
use of computer programs. 

The first part of this paper discusses some properties 
of sound that are useful to the human-computer interface. 
We then propose a program auralization taxonomy describ¬ 


ing how sound may be associated with execution behavior 
and with the structure of the code. For each of the three cat¬ 
egories in the taxonomy we discuss the relevant character¬ 
istics of sound and useful types of program auralization 
tools. We conclude by describing our implementation of a 
sound-enhanced Logo programming environment called 
LogoMedia. 

Non-speech audio at the interface 

Sound has unique properties which can be exploited in the 
computer-human interface. For certain types of data sound 
is the most intuitive means of understanding the informa¬ 
tion. Experiments mapping time-varying economic data to 
sound have found that humans can more effectively identify 
correlations using sound than with graphics (Mezrich, 
Fry singer, and Slivjanovski, 1984). Logarithmic data, 
normally difficult to perceive graphically, has also been 
shown to benefit from audio portrayals since pitch and 
loudness are logarithmic functions of frequency and inten¬ 
sity (Buxton, Gaver, and Bly, in preparation, ch. 3). Gaver 
has demonstrated the ability of “everyday” sounds to alert 
users instantly yet unobtrusively about certain events 
(Gaver, 1989; Gaver, 1990; Gaver, 1991a; Gaver, 1991b). 

Sound can be used to relieve the burden of the visual 
interface. With the increasing popularity of graphical 
interfaces, more and more applications are using images to 
portray information. The resulting collage of windows and 
colors has the potential of overloading the human visual 
system. The computer industry is also moving toward 
smaller, more portable computers with displays limited by 
current technology to fewer colors, less pixels, and slower 
update rates. The effective use of sound offers an attractive 
design solution to problems introduced by both trends, 
providing a new output modality to complement the graph¬ 
ical interface. 

The multi-dimensionality of sound makes it useful for 
presenting complex data which is otherwise difficult to rep¬ 
resent. Sound can be systematically varied across many 
dimensions including loudness, pitch, vibrato, rate of 
modulation, timbre, and tempo (Buxton, Gaver, and Bly, in 


Graphics Interface ’92 



45 


preparation). Some works (Yeung, 1980) suggest human 
can differentiate sounds of up to 20 dimensions. Another 
useful property of sound is that humans do not necessarily 
have to be facing the source in order to hear it. This means 
that the focus of attention can be applied to the computer 
screen, a printout, or even a cup of coffee while continuing 
to process auditory information. 

A program auralization taxonomy 

Auralizations appropriate to the programming domain can 
be divided into three major categories according to the 
characteristics of the information streams mapped to sound 
sequences. Each sound generated as part of an auralization 
corresponds to some item in a time varying stream of data. 
We characterize these streams by their sound generators — 
the activity which determines which datum comes next for 
the purpose of auditory mapping. Probably the most obvi¬ 
ous information stream is the sequence of program states 
during execution. In this case the sound generator is the 
running program itself. As shown in Table 1 we have iden¬ 
tified two other unique sources of program sounds. Each 
activity corresponds with a program development phase 
and covers a particular type of program information. 



Sound 

Program 

Phase 

Generator 

Information 

execution 

executing 

variables, internal state, 
control flow, backtracking 

review 

scrolling 

modules, goals/plans, 
keywords 

preparation 

parsing 

syntactic structure 


Table 1: Three categories of program information which 
are candidates for auralization 


Execution 

The execution of a program causes variables and machine 
state to change over time. By mapping this data stream to 
sound the programmer can listen to his or her code run 
which has the potential to reveal useful information about 
the behavior of the program. 

Review 

A programmer examines his or her code by scrolling 
through the text in a window. The sequence of program sec¬ 
tions encountered can be mapped to sound, helping provide 
contextual information while using the visual information 
from the screen for detailed assessment. 


Preparation 

While compiling or entering a program, the computer eval¬ 
uates language tokens and identifiers in a specific sequence 
not necessarily corresponding to the stream of characters 
making up the code. This sequence can be translated into 
sound, allowing the programmer to monitor the progress of 
such activity. 

An Example 

Consider the following example programming session in 
which we auralize the process of preparing, executing, and 
reviewing a procedure to compute the factorial function. 
Figure 1 shows one way of performing this calculation 
using Logo (Harvey, 1986). A single call to factorial 
generates a series of nested recursive calls which multiplies 
a series of numbers decreasing by one from the original 
value to one. 

1 to factorial :num 

2 if :num « 0 

3 [output 1] 

4 [output (:num * factorial (:num-l))l 

5 end 

Figure Is Logo code for computing factorial 

While typing line 4 we must be careful to match all of 
the nested delimiters. Using a standard editor for entering 
the code, we could easily skip one of the right parentheses. 
A sound-enhanced programming environment in which 
sounds are generated during program entering can inform us 
immediately of such an error. Typing the left bracket 
might initiate an unobtrusive continuous background sound 
such as the soft ticking of a clock as depicted in Figure 2. 
The following left parenthesis might start the background 
sound of a bubbling fish tank which is layered on top of the 
ticking noise. A fan noise might result from typing yet 
another left parenthesis. Each background sound continues 
playing until its respective matching delimiter is typed, so 
that by the end of the line none of the three channels 
should be audible. Now if we skipped one of the right 
parentheses we would reach the end of the line without the 
bubbling stopping—an obvious indication of a problem. 

Suppose we mistakenly implemented factorial as 
shown in Figure 3 without the base case, resulting in a pro¬ 
gram which runs forever or at least until stack memory is 
exhausted. When we call our flawed factorial procedure 
from the Logo interpreter with the argument of 20, the 
computer would pause for a while and finally produce an 
error message. Suppose we suspected the problem was actu- 


[ 

[output ( 

[output (:num * factorial ( 

[output (:num * factorial (:num-l) 

[output (:num * factorial <:num-l)) 

[output <;num * factorial <:num-l))] 


ticking 

ticking and bubbling 
ticking, bubbling and fan 
ticking and bubbling 
ticking 
silence 


Figure 2: Delimiter matching sounds while entering line 4 of the procedure in Figure 1 
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ally with the value of the parameter : num. Was it actually 
decrementing for each successive call, or was the computer 
using the outermost runtime stack frame to determine the 
value of :num? Using a sound-enhanced programming 
environment we can monitor the execution behavior of this 
and other parameters. We might associate a continuous 
sound which varies in pitch with the value of : num as the 
program executes—the lower the number, the lower the 
pitch. Now when we execute the program, we would hear a 
note which starts high but quickly drops in pitch. It would 
be apparent that : num is indeed being decremented and our 
error must be elsewhere. 

1 to factorial :num 

2 output (:num * factorial (:num-l)) 

3 end 

Figure 3: Factorial procedure with missing base case 

If the factorial procedure were part of a larger program 
it might help us to understand the code if we knew how 
often factorial gets called and by which components. A 
sound-enhanced programming environment can be asked to 
generate a sound each time the identifier factorial 
scrolls through the editor window. The sound warns us in 
advance to pay closer visual attention to the code coming 
into view. 

Related audio applications 

In recent years the use of non-speech audio at the interface 
has become a serious study for interface designers. A semi¬ 
nal work by Gaver (1989) was the Sonic Finder which used 
everyday sounds to indicate common operations in the 
Macintosh direct manipulation interface. Dragging the 
mouse generated a scratching sound. The act of deleting 
files by dropping their desktop icons into the trash emitted 
a crash. The mouse cursor at times also acted like a mallet, 
creating sounds as the user clicked on various icons on the 
screen. Large file icons had a low hollow sound, while 
smaller files had a higher pitch. 

Following the Sonic Finder were ARKola (Gaver, 
1991a) and EAR (Gaver, 1991b) which used sophisticated 
event-driven sounds. ARKola simulated the sounds of a 
collection of bottling plant machines running simultane¬ 
ously in various stages of disrepair. EAR sounds included 
paper falling, the shuffling of people gathering in a room, 
and the pouring of a pint of beer to remind users of various 
events and conditions taking place or about to take place at 
Xerox EuroPARC. 

Other studies have focused on using sound to represent 
complex computer data, referred to as data auralization. 
Exvis (Smith, Bergeron, and Grinstein, 1990) used both 
graphics and sound to portray various types of data includ¬ 
ing magnetic resonance scans which varied across many 
dimensions. As the user moved the cursor over the graphi¬ 
cal representation, data points emitted a characteristic 
sound with frequency and intensity related to particular 
dimensions of the sample. Bly (1982) tested human 


analysts on six-dimensional data which she mapped to six 
sound characteristics: pitch, volume, note duration, 
fundamental waveshape, attack envelope, and overtone 
waveshape. She found that the information content of the 
data could indeed be enriched through audio. 

As an aural alternative to graphical icons Blattner, 
Sumikawa, and Greenberg (1989) proposed “earcons” for 
denoting a variety of computer events. Earcons consisted 
of single notes, short melodies or combinations of other 
earcons. The authors suggested that properly designed 
earcons could be learned quickly and associated with arbi¬ 
trary objects and computer operations. 

Recent works have demonstrated the potential of audio 
in the programming domain. Jackson and Francioni 
(1992) used audio to improve the programmer's awareness 
of the behavior of parallel programs by generating sounds 
based on trace data recorded during execution. Program 
events being monitored caused a unique note or melody to 
be played using a particular timbre mapped to each of 16 
processors. Music theory was used to identify the most 
appropriate melodies for each event. Soimenwald, et al. 
(1990) developed a set of primitive function calls for 
incorporating sound in program code for the purpose of 
elucidation. Although the authors primarily discussed the 
capabilities of their system for portraying parallel 
programs, its general audio functions could be applied to a 
variety of programs both concurrent and non-concurrent. 
While animating algorithms Brown and Hershberger 
(1991) generated sounds corresponding in pitch to 
elements being inserted into a hash table, items being 
sorted, and to the number of active threads. The sound of a 
car crash we used to indicate hash collision. Brown and 
Hershberger identify four main uses of sound in algorithm 
animation: “reinforcing visual views, conveying patterns, 
replacing visual views, and signaling exceptional 
conditions.” 

Sound in the programming domain 

Most of the above examples deal with sound in specific 
applications. In contrast this paper discusses the uses of 
non-speech audio in a much more generic context—the 
programming environment. Auralization in the program¬ 
ming environment is a challenging interface design prob¬ 
lem since the sounds must be adaptable to a variety of con¬ 
ditions. A sound-enhanced programming environment 
must represent both data and events aurally. A further com¬ 
plication is that the exact types of data and events are 
unknown, since the programming environment is meant to 
be used to develop arbitrary applications using a variety of 
data and control structures. Finally, depending on the stage 
in the program development process, the programmer may 
use sounds in different ways. When debugging, for 
instance, the programmer may want to introduce generic or 
symbolic sounds quickly to determine the behavior of his 
or her program. When creating auralizations for presenta¬ 
tion purposes, he or she may want more specific or iconic 
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sounds to portray more appropriately the program execu¬ 
tion. 

Figure 4 compares non-speech audio applications 
based on the type of information they represent using 
sound as well as their articulatory directness . Articulatory 
directness as defined by Hutchins, Hollan, and Norman 
(1986) is the degree to which form follows function in an 
interface. Gaver (1989) used this measure to describe the 
intuitiveness of the perceptual mappings used to link the 
“model world” of the computer and the audio display, with 
symbolic being the least intuitive and iconic the most. A 
symbolic mapping is an arbitrary association made 
between sound and computer information, whereas an 
iconic mapping is based on well-known physical 
properties. Exvis and Bly's work which focus on data 
auralization belongs near the symbolic end of the 
articulatory directness scale because they relate properties 
of sound such as pitch and loudness to data which has no 
inherent connection with sound. ARKola and EAR on the 
other hand use iconic mappings because of the obvious 
associations between events and their sounds. 


Iconic Data Events 



Figure 4s Comparison of non-speech audio applications 

The intersection of the two circles in the figure con¬ 
tains those applications using non-speech audio to repre¬ 
sent both data and events. The Sonic Finder belongs in 
this category because of the way impact sounds suggest the 
type and size of files. Another application for this cate¬ 
gory is program auralization systems which portray both 
data and control flow using sounds. A sound-enhanced pro¬ 
gramming environment, represented by the shaded area of 
the figure, must use sound to portray both data and events. 
An area is more appropriate than a point because the pro¬ 
gramming environment must be capable of using sound to 
portray a variety of different types of data and events at dif¬ 
ferent levels of articulatory directness. 

Execution sounds 

Comprehending the course of execution of a program and 
how its data changes is essential to understanding why a 
program does or does not work. Auralization expands the 
possible approaches to elucidating program behavior . 

Protocol studies of novice, intermediate and expert pro¬ 
grammers debugging code suggest they make use of a vari¬ 
ety of techniques for observing the changing values of 


variables and monitoring sub-program calls (Nanja and 
Cook, 1987). As shown in Table 2 we can divide execution 
information into values and events. Value information 
refers to the contents of data structures as they change dur¬ 
ing execution, often known as data flow. Event informa¬ 
tion is the stream of operations on these data structures as 
well as the flow of control from one line of code to the next 
and from one sub-program to the next 


Information 


Type 

Values 

Events 

common 

queue size, tree 

loop, branch, push. 


depth 

pop, search, sort 

arbitrary 

a, b, c 

sub-program calls, 
sub-program returns 

internal 

call stack size. 

register usage. 


memory space 

backtracking 


Table 2: Candidates for auralization during execution 


For both values and events we group the execution 
information into three categories: common, arbitrary, and 
internal. Common information refers to typical data and 
control structures which we can anticipate will be the sub¬ 
ject of interest to the programmer. Sound enhancements to 
the programming environment should include default 
methods for monitoring the status and usage of these struc¬ 
tures. Arbitrary execution information refers to the types 
of data and control flow which cannot be predicted. For 
this type of information a sound-enhanced programming 
environment should provide tools for establishing custom 
auralizations. Internal information is the changing 
machine state over the course of execution which is largely 
programming language and machine dependent. Default 
auralization for this type of data can increase the program¬ 
mer’s awareness of activities such as resource usage occur¬ 
ring behind the scenes. 

Monitoring execution values 

Traditional techniques for monitoring values during execu¬ 
tion generate text to indicate the current program state. An 
obvious method of displaying text relating to program 
variables is to insert output statements directly into the 
source code. Debuggers allow the values of variables to be 
printed without having to modify the source code. A popu¬ 
lar debugger for the Unix environment, dbx provides a set 
of commands for maintaining a list of expressions contain¬ 
ing program variables to be printed during execution. 

Software visualization systems use pictures to present 
often more complex program data in ways which are simply 
not possible using text. Visualizations are useful for 
making abstract program information easier to understand. 
The University of Washington Program Illustrator, for 
instance, recognizes the abstract data structure for digraphs 
within Pascal programs and represents them on the screen 
as a collection of circles connected by arrows (Henry, 
Whaley, and Forstall, 1990). Pictures of programs are 
well-suited for displaying large quantities of computer 
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information simultaneously. In the film Sorting Out 
Sorting Baecker (1981) uses graphs to show the progress 
of several sorting algorithms on thousands of data items. 

Data auralization provides a useful extension to textual 
and graphical techniques for monitoring execution. 
Although not suitable for conveying exact values, auraliza¬ 
tion can indicate trends and increase the number of dimen¬ 
sions capable of being presented simultaneously. 
Auralizations of program data generate sounds varying 
across dimensions of pitch, volume, reverberation, pan¬ 
ning, envelope, and tempo in response to the changing 
values of program variables. An important advantage over 
visual feedback is that auralization frees the programmer to 
inspect the actual code while it is executing. 

Sound can be useful in conveying simple auxiliary 
information relating to common data structures such as the 
size of a queue or the depth of a stack. Ideally, these aural¬ 
izations of high level data structure values could be built 
into the language or class library. Thus, the user is free 
from having to specify the particular element of the data 
structure to map to sound. 

For listening to arbitrary data a sound enhanced pro¬ 
gramming environment must support the ability to map 
changes in variable values to a variety of audio dimen¬ 
sions. The mapping must be flexible so that the user can 
experiment with various auralizations in order to fmd the 
one most appropriate for his or her application. Users 
should be able to select from a variety of synthesized and 
prerecorded sounds, or make their own sounds. To support 
this type of opportunistic auralization in LogoMedia we 
have developed an audio expressions tool for making data- 
sound associations. 

Internal values in an executing program refer to 
machine states not normally known to the programmer. 
The size of the call stack and the amount of free memory are 
two examples of internal values. The default sounds 
available for internal values should easily fade into the 
auditory background of the listener. Patterson’s study of 
alarms (1989) provides useful guidelines for designing 
non-obtrusive sounds such as the use of slow onsets and 
rhythmic patterns. 

Monitoring execution events 

Traditional techniques for monitoring execution events 
produce a sequence of text lines indicating control flow. 
The simplest method requires no additional programming 
environment tools: inserting lines of code before or after 
suspect statements which cause a message to appear on the 
screen. More sophisticated program tracing approaches are 
found in debuggers such as dbx which can automatically 
generate generic tracing information while simulating the 
execution of a program. 

Software visualization systems use pictures or dia¬ 
grams to indicate the execution path. Some visualization 
systems such the Transparent Prolog Machine (TPM) 
(Eisenstadt and Brayshaw, 1988) can graphically trace pro¬ 
grams automatically. Other software visualization systems 
offer more customizable and therefore potentially more 


salient trace feedback than traditional debuggers. 
LogoMotion (Baecker and Buchanan, 1990) is a program¬ 
ming environment in which procedure calls can trigger tai¬ 
lored visualization code. 

Computer audio can enhance textual and graphical exe¬ 
cution event information by generating sounds in conjunc¬ 
tion with the execution of a line or a collection of lines and 
by using multiple voices to indicate layers in a calling 
chain. As with program data, control flow auralization al¬ 
lows the programmer to focus his or her visual attention on 
the actual code being run. Tracing programs using audio 
can also uncover repeated patterns of program behavior. 
Jackson and Jackson (1992) noted that users of their sys¬ 
tem for auralizing parallel program events recognized 
“melodies” characterizing various communication patterns 
between processors. They also pointed out that missing or 
delayed parts of the pattern were noticeable. 

A sound enhanced programming environment should 
allow users to associate sounds with common, arbitrary, 
and internal events. Common events include the execution 
of control structures such as loops and branches as well as 
operations on common data structures such as stacks and 
linked lists. For LogoMedia we have developed sounds of 
plates stacking, unstacking, and breaking for the typical 
push, pop, and overflow events. Additionally, there are 
sounds for indicating list operations using the metaphor of 
a three-ringed binder—popping open for an insertion, tear¬ 
ing for a deletion, and page flipping sounds for a search. 

Arbitrary events of interest to the programmer might 
be calls to sub-programs, returns from sub-programs, or the 
execution of particular lines. Sound enhancements to the 
programming environment should include the ability to 
generate sound in response to the program counter reaching 
arbitrary lines of code. In LogoMedia entering or exiting a 
procedure can trigger auxiliary Logo code for turning 
sounds on or off or changing their quality. We are currently 
developing a tool for specifying program audiopoints 
instead of breakpoints. 

The ability to monitor language or machine dependent 
internal events using sound is certainly worthy of further 
study. Low level internal information such as operations 
on registers and high level information such as Prolog 
backtracking may prove to be good candidates for auraliza¬ 
tion, since the programmer is more likely interested in the 
general pattern of activity, rather than exact events. 

Review sounds 

The review phase when the programmer interactively ex¬ 
plores the source code is another opportunity for integrat¬ 
ing sound into the programming environment. 

Studies in program comprehension (Gellenbeck and Cook, 
1991; Kesler and Uram, 1984) have reported the utility of 
information supplementary to the raw code in providing 
clues to help programmers understand a program and predict 
its behavior. Examples of this kind of information include 
indentation, comments, typographic signaling, mnemonic 
names, module organization, goals and plans, profiling 
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data. Sound offers a new modality for communicating this 
ancillary information. Unique to this phase is the method 
by which sequences of sounds are derived from the series of 
points in the code on which the programmer focuses his or 
her attention. 

Visualizations of software can enhance the program¬ 
mer's understanding of programs by providing a variety of 
views which either compress or elide uninteresting infor¬ 
mation. Baecker and Marcus illustrated a collection of 
information-rich typographic overviews in SEE (Baecker 
and Marcus, 1990). Small's static typographic visualiza¬ 
tion system, Viper, revealed the advantages of interactive 
systems for culling, clarifying, and amplifying parts of the 
program text interesting to the user (Small, 1989). Viper 
allowed the programmer to specify code Biters used for 
remove certain lines of code and highlight statements in 
others. 

Listening to source files 

Computer audio offers an attractive alternative to textual 
enhancements to the code for conveying information about 
the structure or intent of a program. Sound-enhanced pro¬ 
gramming environments should allow portions of code at 
the focus of the user's attention to be mapped to sound. 
Perusing the code causes the system to generate a sequence 
of sounds, providing an auditory context for more detailed 
visual examination. 

Auralizing the source code requires that interesting 
portions of the text somehow be identified for mapping to 
sound. This can be done automatically by parsing the code 
and flagging syntactic structures such as blocks, and proce¬ 
dures as is done with Viper. Profiling and verification sys¬ 
tems such as Unix prof or lint can automatically associ¬ 
ate other kinds of useful data with sections of code. A semi- 
automated alternative is using a structured editor which can 
identify syntactic structures and problem solving strate¬ 
gies. Finally, the programmer can manually select the sec¬ 
tions of code for auralization. Although this last option 
may seem tedious, the programmer may be able to take ad¬ 
vantage of auralizations already in place for the purpose of 
monitoring events during execution. If the programmer 
had been thorough in marking a variety of interesting pro¬ 
gram events with audiopoints, the events themselves can 
serve as "audio landmarks” (Jenkins, 1985) into the code— 
a form of audio documentation. 

In LogoMedia we are developing the ability to gener¬ 
ate audio feedback while scrolling through a L/)go docu¬ 
ment. As interesting items enter and leave the window 
view the sound will change accordingly. Program con¬ 
structs identified during interpretation such as procedure 
definitions, Logo lists, and lines with audiopoints will 
each have characteristic sounds. Programmers can perform 
an auditory search of their code by asking the system to 
generate sounds if the items in focus meet certain criteria 
such as if they are an identifier of a certain name or a partic¬ 
ular Logo command. 

Rapid scrolling through a program may generate 
emergent sounds patterns indicative of particular types of 


code or a certain style of programming. Speeth (1961) 
found that subjects were able to discriminate between earth¬ 
quakes and explosions using compressed auralizations of 
large quantities of seismic data which were not visually dis¬ 
tinct. Because of sound's ability to reveal patterns in mas¬ 
sive quantities of data, new insights may result from the 
rapid audio review of complex program. 

Preparation sounds 

The process of preparing code for execution can be myste¬ 
rious to the programmer . In a sound-enhanced program¬ 
ming environment the order in which programming lan¬ 
guage constructs are parsed can be a source of sound 
sequences and increase the programmer's awareness of this 
ongoing activity. While entering a program syntax- 
directed audio feedback can help identify syntactic errors. 
During compilation sounds can be used to monitor 
progress . 

Syntax-directed editors attempt to reduce obvious syntactic 
errors through typographic feedback during code entry. In 
some systems keywords recognized by the editor change 
font after the programmer types a separator such as the 
space bar. Lines following an IF clause are automatically 
indented until the block is terminated with a semi-colon. 
An obvious problem with syntax-directed editors is that 
they force users to adopt a particular typographic style. 
The persistence of typographic syntax-directed feedback 
may also be an annoyance to programmers. The style of 
the program text remains changed even after the words have 
been typed and the programmer is sure it has been entered 
properly. 

Listening to program entering 

A sound-enhanced programming environment offers the 
programmer an alternative output modality for conveying 
similar syntactic feedback during program entering. An 
approach we are investigating for LogoMedia is to provide 
subtle background sounds to indicate whether the computer 
recognizes program constructs which are typed and to dif¬ 
ferentiate aurally between classes of syntactic structures. 
Word classes might include operators, built-in procedures, 
control constructs, previously declared procedure identi¬ 
fiers, previously declared variable identifiers, new proce¬ 
dure identifiers, and new variable identifiers. Discrete 
sounds are triggered by completing a typed word with a 
delimiting character such as the space bar or comma. After 
a fixed period of time the sound terminates. Following 
Patterson’s approach to reducing obtrusiveness (Patterson, 
1989), each sound conforms to an intensity envelope in 
which the sound begins quietly, is sustained for brief 
moment, then fades to silence. We are also devising 
continuous sounds to demark sections of code and help 
identify problems such as wrong number of delimiters or 
wrong number of parameters. When the programmer starts 
a Logo block with a left bracket, for instance, a sound is 
generated until the matching bracket is typed. Nested 
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sections of code cause collections of continuous sounds to 
be layered 

Listening to compilation 

A programmer typically glances periodically at textual 
output from a compiler during the translation process. A 
sound enhanced programming environment should support 
the ability to generate sounds in response to a compiling 
program to prevent the user from having to shift visual 
focus from other activities. In some programming 
environments such as the Macintosh Programmers 
Workshop sounds are used to differentiate successful from 
unsuccessful compiles. However, more sophisticated uses 
of sound can undoubtedly convey even more than binary 
status information. In the “early days” of computing 
intrepid hardware hackers on the TX-2 attached a speaker to 
the index registers. The resulting sound with some training 
was used by programmers to determine the state of running 
processes such as compiles which had limited visual 
feedback. Richer sounds could be based on the same types 
of audio feedback used during the interactive parsing of 
programs as mentioned in the previous section. An 
interesting question is whether meaning could be extracted 
from the emergent sounds generated by this high speed 
parsing of program structure. Since Logo is an interpreted 
language, LogoMedia has no facilities for compilation- 
based sounds. 


LogoMedia 

We have implemented some of the auralization techniques 
outlined above relating to the execution and review of pro¬ 
grams. Sound capabilities were added to a programming 
environment for software visualization developed here at 
the University of Toronto called LogoMotion (Baecker and 
Buchanan, 1990). The goal of the new version dubbed 
LogoMedia is to provide the programmer with simple, non- 
invasive 1 techniques for using sound to aid in the devel¬ 
opment and presentation of arbitrary programs. We are not 
trying to enforce a particular use of sound for auralizing 
programs, but rather provide a variety of tools for the pro¬ 
grammer to use sound easily while developing. 

LogoMedia does not actually synthesize its own 
sounds but rather sends Musical Instrument Digital 
Interface (MIDI) (IMA, 1983) messages to the Macintosh 
MIDI Manager which then relays commands to sound gen¬ 
erating devices attached to the computer. Using MIDI 
gives us the flexibility which we believe is essential for 
auralizing arbitrary programs. The user can generate 
simple musical sounds through a primitive synthesizer 
responding to MIDI commands or sophisticated sampled 
sounds by connecting a different device. Separating the 
sound generation from the programming environment 


Evasive is the term used by Price, Small, and Baecker 
(1991) to categorize software visualization systems which 
require modification to the source code. It seems an 
appropriate descriptor for program auralization systems as 
well. 


using MIDI has reduced the possibility that the auralization 
itself will cause problems in a program being debugged. 

LogoMedia has three new commands for generating 
sound: startnote, playnote, and stopallnotes. 
st art note begins playing a MIDI sound of a given chan¬ 
nel, note number, and volume. To turn off a note, the vol¬ 
ume of zero is used. The playnote command has the addi¬ 
tional parameter of duration specified in 60ths of a second, 
stopallnotes turns off sound in all 16 MIDI channels. 

Specifying the monitoring of execution values 

In order to facilitate the rapid association of sound with 
execution behavior we have devised a tool for mapping 
expressions to sound qualities. The audio expression tool 
tells LogoMedia what sounds to generate during execution 
based on changes in the program information. Each line in 
the table describes how an expression containing one or 
more program variables is related to particular sound quali¬ 
ties of a specified voice. The tool as depicted in Figure 5 
was designed to encourage users to use sound in ways which 
have been successful in previous auralization applications. 
Specifically, the audio expressions tool makes it easy to 
differentiate by timbre a collection of runtime values being 
monitored and discourages users from mapping this data to 
more than one audio dimension. 


Audio EHpressions jUindoup 


Expression 


Dimension 


Channel 


Duration 



m 


Figure 5: LogoMedia's expression-sound table for 
observing changes in program variables 


To prepare a runtime auralization using the audio 
expression tool the programmer enters a Logo expression 
containing variables used in his or her code. We chose to 
map sounds to expressions instead of simply variables to 
allow the user to scale and offset their values to fit the 
range expected by the MIDI devices. Pitch, for instance, is 
specified by MIDI with a value between 0 and 127, yet 
many instruments, samplers in particular, cannot play 
notes which cover this entire range. After entering the 
expression programmer then selects an audio dimension to 
which the value of the expression will be mapped. 
Currently, the only available dimensions are pitch and vol¬ 
ume, although we plan on adding the ability to control 
reverberation, stereo panning, and envelope. The next 
step for the programmer is to select on of 16 MIDI 
channels which determines the timbre of the resulting 
auralization for the expression. Lastly, the programmer 
specifies the duration of the sound as either “continuous” or 
one of three relative time periods—short, medium, and 
long. The continuous selection implies after the 
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initialization of any variable in the expression the MIDI 
channel will play until the program terminates. The 
relative periods have varying duration depending on the 
rate of execution selected by the user. 

Because of the ease of creating meaningless 
cacophony, the audio expression tool was deliberately 
designed to constrain the user to auralization techniques 
that show the most promise. The layout of the window, for 
instance, encourages users to differentiate their 
expressions by timbre and discourages mapping its value 
to different sound channels. Additionally, the design of the 
tool makes it difficult to monitor the same expression 
using more then one audio dimension. A more flexible 
interface which allowed expressions to be entered for each 
sound quality including timbre was rejected, since it 
provided no guidance as to the more effective uses of sound. 
We have also rejected the ability to enter absolute duration 
times, since this reduces the adaptability of the auralization 
to various rates. We believe that the capability to play 
back programs at different speeds we believe is key to 
deriving meaning from auralizations and this is planned to 
be a focus of our user study. 

The constraints imposed by the audio expressions tool 
reflect simply techniques which have been effective in the 
past, but we cannot expect the tool to facilitate all useful 
"Bonifications.” However, the audio expressions tool can 
be used to generate a first approximation of the appropriate 
auralization which can later be refined by the programmer 
using the Logo language. By pressing the "show code" 
button the user can see the underlying Logo code responsi¬ 
ble for auralizing a particular Logo expression. This code 
can then be copied into a LogoMedia document and edited. 

Informal observations from the use of the audio 
expressions tool reveal weaknesses which must be 
addressed in the next iteration of the interface before user 
testing. Scaling and offsetting a variable to fit MIDI 
protocol has proved cumbersome. A more fundamental 
problem is predicting the range of values the variable will 
take on. This was a challenging task even for the original 
author of some simple turtle drawing programs. To expect 
the programmer to know the range of a variable is in a 
sense begging the question: if the programmer knew its 
variance, there might be no point to auralizing. This 
problem is not unknown to the field of software 
visualization in which limits are imposed by the boundary 
of screen coordinates. Our next version of LogoMedia will 
monitor the range of values during each execution of a 
program and present this information to the user to help 
guide the next iteration of the auralization specification. 

Specifying the monitoring of execution events 

In order to provide LogoMedia with simple non-invasive 
techniques for auralizing control flow, we are developing a 
method for associating audio commands called audiopoints 
with individual lines of code. Figure 6 illustrates a sample 
LogoMedia editor window with the Logo code on the right 
and its associated audiopoints on the left. To add an audio¬ 
point to a line of the program the user simply positions the 


cursor in the left column beside the desired line and types a 
Logo statement. Alternately, the programmer after moving 
the cursor over the appropriate column can simply play a 
note on an attached MIDI keyboard. The corresponding 
LogoMedia command for reproducing this note will then 
appear as an audiopoint. As the program is running 
LogoMedia will execute each audiopoint immediately prior 
to the line of code on its right. 



Figure 6: LogoMedia s two column format for 

associating audio with program events 


A simple audiopoint such as on the second line can 
generate a fixed length tone as the code is executed to indi¬ 
cate the line has been reached. Additionally, the program¬ 
mer can call his or her own custom auralization procedures 
or choose from a library of useful sound procedures. We are 
constructing such a library with the ability to toggle 
sounds on and off and layer sounds. The audiopoint on line 
4, for instance, causes drumming sounds to become increas¬ 
ingly dense as the factorial program descends into succes¬ 
sively deeper levels of recursion. As the program pops out 
of the nested levels, the audiopoint on line 5 causes the 
sound to become less dense. 

Conclusions 

Whether sound can be used effectively in programming 
environments remains to be determined. Testing and 
evaluation of LogoMedia is planned to ascertain which 
applications of audio in the program development process 
are most useful. We do not expect sound to replace either 
traditional textual representations or software 
visualizations, but to complement them. Further research 
is needed to determine how all three modalities can be best 
synthesized to take advantage of the multi-media 
capabilities of today’s computers. We hope that by 
enhancing the development environment with sound we 
can make programming more engaging and empower the 
programmer with tools for more accurate entering, faster 
debugging, and an improved understanding of a program’s 
function, structure, and behavior. 
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Abstract 

In recent years many prototype systems have been 
developed for graphically visualizing program execution 
in an attempt to create a better interface between software 
engineers and their programs. Several classification- 
based taxonomies have been proposed to describe 
computer program visualization systems and general 
frameworks have been suggested for implementation. In 
this paper we provide a framework for both describing 
existing systems and implementing new ones. We 
demonstrate the utility of automatic program visualization 
by re-implementing a number of classic systems using 
this framework. 

R€sum£ 

R6cemment on a developpe beaucoup de systcmes 
prototype pour visualiser graphiquement l’exdcution d'un 
programme afin de crder une meilleure interface entre les 
cr^ateurs de logiciel et leurs programmes. Plusieurs 
taxonomies bashes sur une classification ont 6i6 
proposees pour d£crire des syst&mes de visualization de 
programmes et des cadres g£n£raux ont 6t& proposes pour 
Jeur realisation. Dans ce papier nous foumissons un cadre 
dans lequel on peut d&rire des sytemes actuels et ex6cuter 
de nouveaux syst&mes. On montre Tutilite de visualization 
automatique de programmes en r6-ex£cutant quelques 
anciens systfcmcs dans le cadre de ce prototype. 
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Introduction: 

What is Software Visualization? 

The tools traditionally used by software engineers to help 
monitor and analyse program execution have been plain 
ASCII-text based debugging environments which usually 
allow the user to trace the currently executing code, stop 
and start execution at arbitrary points, and examine the 
contents of data structures. Although they can be 
understood by experts, these tools have a limited peda¬ 
gogic value and by the early 1980's the work of Baecker 
and Sherman (1981) and Brown (1988) showed how 
algorithms could be animated with cartoon-like displays 
that show a high level abstraction of a program's code and 
data (Brown referred to this as “algorithm animation"). 

A concurrent development in the mid-1980's was the ap¬ 
pearance of systems which displayed graphical representa¬ 
tions that were more tightly coupled with a program’s code 
or data and showed more or less faithful representations of 
the code as it was executing. Although the displays were 
not as rich as the custom built algorithm animations, 
these systems were closer to the tools that software engi¬ 
neers might use. These “program animators" together with 
the algorithm animators became known as “program visu¬ 
alization" systems. We prefer the more descriptive term 
“software visualization” (Price, Small, & Baecker, 1992) 
which encompasses both algorithm and program 
visualization as well as the visualization of multi-program 
software systems. In this paper we will use the term 
software visualization (SV) to describe systems that use 
visual (and other) media to enhance one programmer's 
understanding of another's work (or his own). 

Classifying SV Systems 

One of the first taxonomies of SV was that of Myers 
(1986) (updated later as (Myers, 1990)), which served to 
differentiate SV, visual programming, and programming- 
by-example. In classifying SV systems, Myers used only 
two dimensions: static vs. dynamic and code vs. data. The 
first dimension is based on the style of implementation; 
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static displays show one or more motionless graphics 
representing the state of the program at a particular point 
in time while animated displays show an image which 
changes as the program executes. The second dimension 
describes the type of data being visualized, be it the 
program source code or its data structures. 

The taxonomies that have been proposed since Myers 
have also used few dimensions, which seems to ignore that 
fact that there are many styles for implementation and in¬ 
teraction as well as different machine architectures and 
ways of utilizing them. Price, Small, and Baecker (1992) 
recently proposed a taxonomy which describes 6 broad 
categories for classifying SV systems: scope, content, 
form, method, interaction, and effectiveness. Each of 
these categories has between three and seven 
characteristics, for a total of thirty dimensions to describe 
each system. This pragmatic classification system 
provides a means for comparing the functionality and 
performance of a wide range of SV systems, but it does not 
provide a language or framework for implementing new 
systems. Eisenstadt et al. (1990) described nine qualitative 
dimensions of visual computing environments which can 
form the basis of a language for describing SV systems, 
but these serve only to describe the attributes of systems 
rather than drive their construction. 

From Taxonomy to Framework and System: 
“what goes on?” 

Taxonomies are useful, but we need more if we are to pro¬ 
vide a firm basis upon which to describe SV systems in 
depth, let alone implement them. A framework for describ¬ 
ing SV systems could provide extra leverage by being a 
little more prescriptive, i.e. making a commitment regard¬ 
ing how to approach the design and construction of SV 
systems. In fact, it is not a very big step from specifying 
such a framework to designing a system for building a SV 
system (SV system-building system). An important differ¬ 
ence is that the former activity is merely a paper exercise, 
whereas the latter activity is intended to lead towards a 
working tool. Indeed, the latter activity serves as a useful 
forcing function: it encourages us to build re-usable li¬ 
braries of software that we believe encapsulate important 
generalizations about SV system-building. The proof of 
the soundness of a design built in this way lies in the abil¬ 
ity to use it both to reverse-engineer existing SV systems 
and construct new systems with ease. 

Programming Language Visualization vs 
Algorithm Animation 

Several of the noteworthy SV building systems and frame¬ 
works focus on algorithm animation, which means that 
the animations that they produce are custom designed and 
each new program requires manual annotation to animate 
it. Programs are animated in BALSA (Brown, 1988) by 
adding calls to the animation system at “interesting 
events" in the code. Systems implemented by Stasko 
(1990) and London and Duisberg (1985) provide facilities 


for smooth transitions in animations based on 
“interesting event" calls. 

In our work, we have focussed on supporting the 
construction of systems which visualize the execution of 
programming languages. By visualizing a programming 
language interpreter (or compiler) one also, in some 
sense, automatically gets a visualization for any program 
written in that language (thus achieving the automatic 
goal suggested by Price, Small, & Baecker (1992)). 
Programming language visualization (PLV) and algorithm 
animation (AA) overlap, but there are differences in the 
approach. AA systems typically show a very high level 
picture of a program’s execution and the images that it 
generates can be far removed from the data structures 
contained in the program. The animations cover a narrow 
set of programs (typically a single algorithm). PLVs on 
the other hand have to deal with any program which can be 
realised in the language. Thus PLV displays usually have 
much simpler images than AA displays since they must be 
highly generalized whereas AA displays can be custom 
tuned. The problem for an AA system is to show the 
characteristics (signature) of an algorithm as clearly as 
possible. The problem for a PLV is to allow arbitrarily 
large execution spaces to be examined in a 
comprehensible fashion. 

Our approach is to concentrate primarily on PLV, but to 
provide generalizations which arc applicable to AA as 
well. In the rest of this paper we describe the design of a 
SV system-building system (and framework) called Viz, 
which we have implemented as a prototype running in 
Common Lisp and CLOS on Sun workstations. Our Viz 
implementation has already been used to reconstruct three 
well known PLV systems: an OPS-5 visualization system 
(based on TRI (Domingue & Eisenstadt, 1989)), a Prolog 
visualization system (based on TPM (Eisenstadt & 
Brayshaw, 1988)), and a Lisp tracer (based on the 
Symbolics™ tracer). After describing the Viz architecture, 
we explain how one of these reconstructions was 
implemented. In order to explore the relationship between 
Viz’s PLV-oriented approach and AA-oriented systems we 
have also used Viz to implement some of the animations 
from Brown’s BALSA (sorting) and Stasko’s TANGO (bin¬ 
packing). We conclude with a comparison of the termi¬ 
nology used in BALSA, TANGO, and Viz to describe ab¬ 
stractions and we highlight aspects of the Viz design. 

Viz Architecture 

In Viz, we consider program execution to be a scries of 
history events happening to (or perpetrated by) players . 
To allow SV system builders considerable freedom, a 
player can be any part of a program, such as a function, a 
data structure, or a line of code. Each player has a name and 
is in some state , which may change when a history event 
occurs for that player. A player may also contain other 
players, enabling groups of players to be formed. History 
events are like Brown's “interesting events” in BALSA— 
each event corresponds to some code being executed in the 
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Figure 1. The Architecture of Viz 

program or some data changing its value. These events are 
recorded in the history module, which allows them to be 
accessed by the user and “replayed.” Events and states are 
mapped into a visual representation which is accessible to 
the end-user (the programmers who need to use the SV sys¬ 
tem, not the SV system builder). But the mapping is not 
just a question of storing pixel patterns to correspond to 
different events and states—we also need to specify differ¬ 
ent views, and ways of navigating around them. The main 
ingredients of Viz are: 

• Histories: a record of key events that occur over 
time as the program runs, with each event 
belonging to a player; each event is linked to 
some part of the code and may cause a player to 
change its state (there is also some pre-history 
information available before the program begins 
running, such as the static program source code 
hierarchy and initial player states). 

• Views: the style in which a particular set of play¬ 
ers, states or events is presented, such as using 
text, a tree, or a plotted graph; each view uses its 
own style and emphasizes a particular dimension 
of the data that it is displaying. 

• Mappings: the encodings used by a player to 
show its state changes in diagrammatic or textual 
form on a view using some kind of graphical 
language, typography, or sound; some of a 
player’s mappings may be for the exclusive use of 
its navigators. 

• Navigators: the tools or techniques making up 
the interface that allows the user to traverse a 
view, move between multiple views, change 
scale, compress or expand objects, and move 
forward or backward in time through the 
histories. 


This framework is equally at home dealing with either pro¬ 
gram code or algorithms, since a player and its history 
events may represent anything from a low-level (program 
code) abstraction such as “invoke a function call” to a 
high level (algorithm) abstraction such as “insert a 
pointer into a hash table.” 

Figure 1 shows the general architecture of Viz. The target 
system source code is annotated to generate history calls. 
When the system being visualized is a programming lan¬ 
guage, hooks into the interpreter or compiler are used to 
generate history events. As the code executes, the inserted 
calls cause “interesting events” regarding players to be 
recorded in the history module. 

When the user runs the visualization, the view module 
reads the history data at the request of the navigator. The 
view module sets the layout of the history events and 
sends local coordinates for each history datum through the 
mapping module, which draws a graphical or textual repre¬ 
sentation for each event. The screen images are then trans¬ 
formed and presented on the screen by the navigator. The 
user interacts with the visualization using the navigator, 
which sends control signals to the view module to cause 
all changes in the visualization, such as panning, zoom¬ 
ing, local compression and expansion, and moving for¬ 
ward and backward in time through the program execution 
space. 

Histories 

The First task for the visualization programmer using Viz 
is to decide what types of events may occur during program 
execution, which elements in the program will be the 
players and how the players change state. After defining 
these, the programmer may insert create new player and 
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note event calls in the code, which form the interface be¬ 
tween a program and its visualization. 

Figure 2 shows a prototypical history structure in the his¬ 
tory module. This consists of a set of players and a se¬ 
quence of history states. Each player has a name, a pointer 
to its current history state and a pushdown stack of previ¬ 
ous states. A player may contain other players, as shown 
by player 5 in figure 2. This feature is useful in naviga¬ 
tion. Each state has a timestamp, a pointer to the appro¬ 
priate segment of source code and an event structure. As a 
program executes, new players and history states are cre¬ 
ated, and existing players are “moved” into new states, 
pushing previous states onto a stack. The various states of 
the players are caused by the different types of events. 

The choice of players and event types together with the ju¬ 
dicious placement of note event calls in code determine the 
execution model. Currently, we do not advocate any 
methodology for creating the execution model, except to 
point out that events are the “things that happen” in a 
program causing a player or players to move into a partic¬ 
ular state. 

Views 

A view can be thought of as a perspective or window on 
some aspect of a program or algorithm, with (possibly) 
many views making up a visualization. There are some 
similarities between our views and the animation views, 
adapted from the Model-View-Controller paradigm, de¬ 


scribed by London and Duisberg (1985). The main 
difference is that within the animation views, the layout, 
handled by views in Viz, and appearance, handled by map¬ 
pings in Viz, are handled together. Each view in Viz can be 
thought of as embodying a style of formatting collections 
of objects. Within Viz we have constructed a hierarchy of 
views, including text, graph, table and tree based views, 
each embodying a particular layout style, which can be 
used or specialized by the SV builder. 

A view requests data from the history module and sends it 
to the mapping module, which decides the appearance. The 
view module then tells the mapping module where to dis¬ 
play the mapped data. This means that the view module is 
concerned only with the position of the history data item, 
not its appearance. 

The view module is also responsible for managing the 
compression (ellision) and expansion of elements in the 
display based on user commands from the navigator. If the 
user selects the compression of display elements, such as a 
subtree in a tree hierarchy, then the view module groups 
the players concerned into a new player which has its own 
mapping to represent the compressed players. When the 
navigator tells the view module that an element is to be 
expanded, the view disbands that new player and displays 
the individual mappings for the players. 


Graphics Interface ’92 














































57 



Prolog 

OPS5 

Lisp 

Sort 

Bin-Packing 

Player 

predicate instantia¬ 
tion (goal) 

rule 

form 

data item 

data item 

bin 

States 

pending goal; 

succeeded; 

failed; 

failed on backtrack¬ 
ing; 

redo-goal 

failing to match 
working memory; 
firing 

unevaluated; 

evaluated 

location 

attempting to fit; 
suceeded; new 

Events 

call; 

exit; 

fail-lst; 

fail-nth; 

redo 

choose for firing 

call; 

return 

assignment of 
item to cell 

attempt-fit 

succeed-fit 

new-item 

Mappings 


or 

% 

+ 

-> italic ; 

<- bold 

• 

Qll 

Views 
(in order of 
decreasing 
granular¬ 
ity) 

tree: players, play¬ 
ers current state; 
formatted text 

table: players vs. 
cycles, 

player's state @ 
cycle 

formatted text 

pretty printed 
code: player's 
current state 

point plot: play¬ 
ers, 

player’s value & 
current state; 
formatted text 

point plot using 
rectangles: bin- 
players and cur¬ 
rent state 


Table Is A Viz description of five example systems 


Mappings 

The goal of a mapping in Viz is to communicate the max¬ 
imum amount of information about a player's state while 
imposing the least possible cognitive load on the user. 

In conjecturing a theory of effectiveness of graphical lan¬ 
guages, Mackinlay (1986) noted Cleveland and McGill’s 
observation that people accomplish the perceptual tasks 
associated with the interpretation of graphical 
presentations with different degrees of accuracy. Using 
psychophysical results, Mackinlay extended Cleveland 
and McGill's work to show how different graphical tech¬ 
niques ranked in perceptual effectiveness for encoding 
quantitative, ordinal, and nominal data. He found that the 
position of the data item in the x-y plane is ranked first for 
all three types of data, which is why we separate the view 
layout from the mappings. The other techniques which 
may be varied to create an effective mapping are (in de¬ 
creasing order of effectiveness): colour hue, texture, con¬ 
nection, containment, density (brightness), colour satura¬ 
tion, shape, length (size), angle or slope (orientation), 
and area (or volume). 

A mapping in Viz is attached to a particular type of player, 
event or state, and view. Multi-method inheritance occurs 
over the class of entity and view, allowing a Viz user to 
formulate expressions such as “all entities in view-x are to 
be displayed as a filled triangle”, “entity-y is always dis¬ 
played as a white circle” and “enlity-a is displayed as a cir¬ 
cle in tree based views but as a square in all other views”. 
Mappings can be inherited, forming an inheritance hierar¬ 


chy in much the same fashion as views. Our future work in 
Viz will create a library of mappings. 

Navigators 

The Viz navigator module encapsulates the interface be¬ 
tween the user and the visualization, although the methods 
for performing the navigation tasks are found in the view 
module, thus allowing custom navigation interfaces to be 
built independently of the task. 

Our prototype provides a replay panel (see the screen 
snapshot in figure 3) for searching, which has buttons for 
moving to the beginning or end of the animation, single¬ 
stepping forward or backward, playing forward, fast-for¬ 
warding and stopping. Stepping in Viz involves notifying 
the history and view modules of the change of focus (the 
history module then selects the next appropriate event). 
Horizontal and vertical scroll bars are provided for pan¬ 
ning while simple zoom in and zoom out buttons provide 
scaling. The user can select a fine grained view of a data 
element by clicking on it. 

Examples Defined in Viz 

The descriptions of the three systems that we have imple¬ 
mented using Viz are presented in table 1 along with the 
two examples from BALSA and TANGO animations. The 
table provides a summary of the players, states, events, 
mappings, and views used in each visualization. Each row 
represents a distinct Viz entity type and each column rep¬ 
resents one of the visualizations. The player row list* the 
players which can take part in each example. The states 
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Figure 3: Screen Snapshot of Prolog Visualizer (greyscale from a colour screen) 


row shows the possible states players can enter. The 
events row shows the events which cause state changes. 
The mappings row contains, in order, the icon mapping 
for each state. The views row lists the names of the possi¬ 
ble views in decreasing order of granularity. The connec¬ 
tion between a view and the history is also shown. We 
shall now explain the first column, the Prolog visualizer, 
in detail. 


A Prolog Visualizer 

The Prolog visualizer is based on the Transparent Prolog 
Machine (TPM) (Eisenstadt & Brayshaw, 1988). TPM uses 
an AND-OR tree representation where the nodes represent 
goals which are instantiated Prolog predicates, and the 
arcs represent conjunctions or disjunctions of subgoals. 

The players in the visualization are the instantiated Prolog 
predicates or goals in the proof tree. The events, which are 
adapted from the Byrd Box Model (Byrd, 1980), are: call 
(trying to prove a goal), exit (a goal succeeding), fail-1st 
(a goal failing the first time attempted), fail-nth (a goal 
having succeeded earlier, later failing on backtracking), 
and redo (re-attempting to satisfy a goal). There is a corre¬ 


sponding state and mapping for each event type (shown in 


respective order so **13 equals call, ^Jequals exit, etc.) 


The Prolog interpreter takes a list of goals left to prove. 
When no goals are left the environment is returned. The 
algorithm (adapted from (Nilsson, 1984)) for the inter¬ 
preter is: 

If there is nothing to prove 
then return the environment; else 
if the first-goal-left-to-prove 
is true, then 

note event: succeed, the goal that 
was proved, env, 
and prove the 
remaining goals; else 
note event: goal, 

first-goal-left-to-prove, 
env, and create a player for 
the-first-goa1-left-t o-prove, 
and 

loop for each clause in the database 
if the head of the clause 
matches the first goal then 
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create a player for 
each of the subgoals 
in the clause 

if we prove the new list of 
goals (which is the 
body of the matched 
clause appended to the 
rest of the goals) 
then return the new 
environment else 
note event: redo, 
first-goal-left-to-prove, env, 
note event: failure, 

first-goal-left-to-prove, env, 
and 

return failed-to-prove 

first-goal-left-to-prove . 

In the above, the algorithm is shown in italics while the 

Viz event calls are shown in plain text. 

Figure 3 shows a screen snapshot of the proof for the goal 

?- desperate(?x) given the following Prolog database: 

desperate(?x) :- 
name-dropper(?x), 
unemployed(?x). 

name-dropper (?x) knows(?x, ?y), 

famous (?y). 

name-dropper (?x) knows (?y, ?x), 

famous (?y). 

knows(joe, mick). 

knows(Charles, fred). 


famous(mick). 
famous(Charles) . 
unemployed(fred) . 

Bearing in mind that atoms beginning with “?” depict 
variables in this approximation of Edinburgh-syntax 
Prolog, the first five lines of the code above defines a) the 
relation that someone is desperate if they are name dropper 
and unemployed; and b) a person is a name dropper if they 
know someone who is famous or if someone famous 
knows them. The rest of the code defines five “facts” about 
who knows who, who is famous and who is unemployed. 

Because history events are invoked by inserting hooks 
into our own Prolog interpreter, the Viz implementation is 
straightforward once the machinery for players, views, 
mappings, and navigators is in place. The simple imple¬ 
mentation, as described here, can deal with non-trivial 
cases of tricky backtracking and unification. Coping with 
arbitrarily large proof trees requires the definition of a 
“collapsed predicate” player. The tree beneath a collapsed 
predicate player would not be displayed unless requested 
(by clicking on it with the mouse). The current set of col¬ 
lapsed predicates would be chosen by the user. This col¬ 
lapsed predicate set would correspond to the segments of 
code the user deemed irrelevant and thus could be “black 
boxed away”. In order for collapsed predicates to be dis¬ 
tinguishable, the mapping for a collapsed predicate player 
would be a triangle. The collapsed predicate player would 
in fact contain the players within its subtree. A request to 
show the full sub-tree would result in the replacement of 
the single collapsed predicate player with the players it 
contained. 


BALSA 

TANGO 

Viz 

Comments 

Interesting 

(Algorithm) 

Events 

Algorithm 

Operations 

Events and 

Create 

Players 

The BALSA and TANGO terms are virtually identical while Viz 
events can be arranged hierarchically, and are designed to relate 
to the code rather than the algorithm. 

Modellers 

Image, 
Location, 
Path, and 
Transition 

States and 
Players 

In describing a visualization’s internal representation, TANGO 
adds to the BALSA framework by providing 4 abstract data 
types (geared towards animation); Viz’s states and players are 
program execution level abstractions. 

Renderers 

Animation 

Scenes 

Mappings 
and Views 

BALSA provides a general mechanism for each view while 
TANGO provides reusable libraries of animation scenes; Viz 
discriminates between the actual images that are mapped to the 
screen and the style in which they are displayed (the view). 



Navigators 

BALSA and TANGO don’t specify any kind of user interface in¬ 
teractions within the framework, nor techniques for dealing with 
arbitrarily large programs. 

Adaptor and 

Update 

Messages 


History 

The Viz history is a structure for the collection of events, states, 
and players generated during program execution. The history 
module includes various searching and filtering functions. 


Table 2: A Comparison of Terminology 
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Additional Systems 

To fully exercise our evolving framework across a range of 
players, events, states, mappings, and views, we have 
also used Viz to re-implement the textual visualization 
provided by the Symbolics™ Lisp stepper/tracer which 
uses layout to summarize the execution history of the Lisp 
evaluator (the Viz implementation actually improves on 
this by using colour and typography as well) and the table 
based vizualization of an OPS-5 style rule interpreter. We 
have also duplicated some of the well known AA examples 
from BALSA (sorting) and TANGO (bin packing) to show 
that the system can be used to easily construct custom al¬ 
gorithm animations as well. 

A Comparison of Viz, Balsa, and Tango 
Terminology 

Although each of the goals of Viz, BALSA, and TANGO are 
somewhat different, the importance of the pioneering 
work of BALSA and TANGO is such that a close compari¬ 
son of terminology is warranted. Viz terminology is de¬ 
signed to allow existing systems to be described as well as 
implement new ones. Table 2 shows the systems in left- 
to-right chronological order, mapping the similar termi¬ 
nology across systems where appropriate, and highlight¬ 
ing differences accordingly in the “comment” column. 

Conclusions 

The main goal in designing Viz was to provide a descrip¬ 
tive mechanism for understanding and explaining the di¬ 
verse notations and methodologies underlying existing 
software visualization environments. Our re-implementa¬ 
tion based approach is in contrast to the current literature 
(Eisenstadt et al., 1990; Green, 1989; Green, 1990; 
Myers, 1990; Price et al., 1992) which focusses on cogni¬ 
tive and notational dimensions and practical categories. 
The amount of effort involved in our approach is of the 
same order of magnitude as category or dimension based 
approaches. Each of the example systems was constructed 
within 1-2 days (this included many extensive alterations 
to the first version of Viz) and is of the order of 100 lines 
of code. 

By providing a descriptive abstraction for internally ex¬ 
pressing the state of an algorithm (players, events, and 
states) we have augmented earlier frameworks and added to 
the common language for representing algorithm anima¬ 
tion designs. Since Viz provides visualization facilities 
for programming languages, we have provided a frame¬ 
work for generalized visualization that is applicable to 
software engineers since it provides visualizations that are 
automatic and faithful to the execution model of the lan¬ 
guage. The use of Viz to implement systems which differ 
widely in terms of their scope, content, form, method, in¬ 
teraction, and effectiveness, suggests that the framework 
is sufficient to design and implement a wide class of soft¬ 
ware visualization systems. 
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Abstract 

The Fourier projection-slice theorem states that the inverse 
transform of a slice extracted from the frequency domain 
representation of a volume yields a projection of the volume 
in a direction perpendicular to the slice. This theorem allows 
the generation of attenuation-only renderings of volume data 
in 0(N 2 log TV) time for a volume of size N 3 . In this paper, 
we show how more realistic renderings can be generated 
using a class of shading models whose terms are Fourier pro¬ 
jections. Models are derived for rendering depth cueing by 
linear attenuation of variable energy emitters and for render¬ 
ing directional shading by Lambertian reflection with hemis¬ 
pherical illumination. While the resulting images do not exhi¬ 
bit the occlusion that is characteristic of conventional volume 
rendering, they provide sufficient depth and shape cues to 
give a strong illusion that occlusion exists. 

Keywords: Volume rendering, Fourier projections, Shading 
models, Scientific visualization. Medical imaging 

1. Introduction 

Volume Tendering is a technique for visualizing a sam¬ 
pled scalar or vector volume of three spatial dimensions by 
modeling the propagation of light in a participating medium. 
In its most general form, parameters of the data are mapped to 
physical parameters of the medium such as energy density, 
absorption and scattering coefficients, and scattering phase 
function [Krueger90]. The integro-differential equation 
governing light transfer is then solved to obtain an energy 
equilibrium for the volume [Siegal81], 

To reduce solution costs, practical volume rendering 
algorithms assume a low albedo, allowing them to ignore the 
effects of interreflection [Levoy88, Sabella88, Drebin88]. 
The resulting equation can be evaluated independently for 
each viewing ray using an expression of the form 
- 

/ = J fit) exp[-J g(u) du]dt (1.1) 


where t represents distance along the ray. This expression is 
typically evaluated by digital compositing [Porter84], which 
in the context of volume rendering is equivalent to numerical 
integration using the rectangle rule. Given a cubical volume 
measuring N voxels on a side, and assuming pixel spacing 
equal to voxel spacing, the cost of generating an image using 
these algorithms is O (V 3 ). 

If we simplify the physical model still further by ignor¬ 
ing emission and first scatter, we produce an image that looks 
like an X-ray. The expression for each viewing ray now takes 
the form 

/ = exp[-Jg(r)<*]. (1.2) 

For an imaging geometry that employs uniformly 
spaced parallel rays, expressions of this form can be evaluated 
efficiently using the Fourier projection-slice theorem. This 
theorem allows us to compute integrals over volumes by 
extracting slices from a frequency domain representation of 
the volume. The application of this theorem to image syn¬ 
thesis has been independently proposed in [Dunne90] and 
[Malzbender]. An application to MR angiography is 
described in [Napel91]. 

The computational advantage of the projection-slice 
theorem is, at least in theory, considerable. Given a cubical 
volume and pixel spacing equal to voxel spacing as above, the 
cost per image is O ( N 2 log N) after an O (/V 3 log N ) prepro¬ 
cessing step to compute the 3D frequency domain representa¬ 
tion. For a volume and image of width N = 256, volume 
rendering using digital compositing requires 16M operations, 
while rendering using the projection-slice theorem requires 
only 512 K operations, a speedup of more than an order of 
magnitude. 

There are two disadvantages to this approach. From a 
practical standpoint, signal processing considerations have 
thus far prevented researchers from obtaining speedups 
matching the theoretical complexity. We expect these prob¬ 
lems to be solvable, and, aside from a brief explanation in 
section 2, we do not address them further in this paper. A 
more fundamental disadvantage of the Fourier approach is 
that the resulting images do not exhibit occlusion and are con¬ 
sequently not as useful as volume renderings for many appli¬ 
cations. Fortunately, occlusion is only one of many cues 
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employed by the human visual system to determine the shape 
and spatial relationships of objects. Other cues include per¬ 
spective, shading, texture, shadows, atmospheric attenuation, 
stereopsis, binocular convergence, ocular accommodation, 
head motion parallax, and the kinetic depth effect 

In this paper, we consider techniques for augmenting 
equation 1.2 to include as many depth and shape cues as pos¬ 
sible while retaining the computational efficiency of evalua¬ 
tion using the projection-slice theorem. The key restriction 
placed on us is that, to avoid recomputing the 3D frequency 
domain representation for each view, the integrand must be 
independent of viewing direction. If we wish to support rota¬ 
tion of the volume relative to a light source, the integrand 
must also be independent of lighting direction. A class of 
shading models that satisfies these constraints has the form for 
each viewing ray 

/ = i> 

i =o 

for scalars w { and scalar volumes /•. As we shall see, a 
surprisingly large class of shading models are expressible as 
linear combinations of this form where the fi's are indepen¬ 
dent of both viewing and lighting directions. In particular, we 
consider the following three models: 

• X-rays with depth cueing 

• X-rays with directional shading 

• X-rays with depth cueing & directional shading 

The remainder of the paper is organized as follows. 
Section 2 describes the Fourier projection-slice theorem and 
gives an algorithm for using it to generate arbitrarily oriented 
X-rays of a sampled scalar volume. In section 3, we derive 
formulations for the three shading models listed above plus a 
few extensions. Section 4 describes our implementation and 
results, and section 5 gives conclusions and suggests direc¬ 
tions for future research. 


jMOdl 


(13) 


2. The projection-slice theorem 

The Fourier projection-slice theorem states that the 
inverse 2D Fourier transform of a slice passing through the 
origin of the 3D Fourier transform of a function gives a pro¬ 
jected image of the function, where the projection is in a 
direction perpendicular to the extracted slice. Each point on 
the projection is the integral of the function over the infinite 
ray passing through that point and normal to the slice 
[Dudgeon84]. For later use, we define a Fourier projection 
operator n v with input function / on R 3 and output image I 
on R 2 such that 

/ = n W) = FUFJf}*v}- (21) 

5y restricts the spectrum to a plane passing through the origin 
and perpendicular to viewing direction V, and p„ and P N 
are the forward and inverse N-dimensional Fourier 
transforms, respectively. 

The Fourier projection-slice theorem is the basis for 
one variant of computed tomography (CT). Specifically, if 
the 2D Fourier transforms of a set of X-rays of an object taken 
at different angles are inserted into a 3D spectrum such that 
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Figure 1: Algorithm for generating an attenuation- 
only rendering of a volume dataset using the Fourier 
projection-slice theorem. 

each slice passes through the origin and is perpendicular to 
the (parallel) X-ray beam used to acquire that slice, the 
inverse 3D Fourier transform of the composite spectrum 
yields a 3D radiometric facsimile of the object, i.e. a CT 
dataset [Macovski83]. 

By reversing the CT acquisition process, we can gen¬ 
erate X-rays from CT data. A typical implementation is 
shown in figure 1. In a preprocessing step, we compute the 
3D discrete Fourier transform (DFT) of a scalar volume sam¬ 
pled on the three-dimensional integer lattice. For real 
volumes (i.e. having no imaginary component), a discrete 
Hartley transform (DHT) [Bracewell86] may be substituted. 
For each view, we extract a slice passing through the origin 
and compute its inverse 3D DFT. Since an arbitrarily 
oriented slicing plane does not typically pass through points 
on the integer lattice, slice values must be computed by inter¬ 
polation from nearby samples. 

For a volume measuring N voxels on a side where N is 
an integer power of two, the computation of a 3D spectrum 
using a Fast Fourier Transform (FFT) or Fast Hartley 
Transform (FHT) algorithm requires 0(N 3 log N) operations. 
The cost of extracting a 2D slice measuring N samples on a 
side from the 3D spectrum is O ( W 3 N 2 ) where W is the 
nonzero width of the interpolating filter. To complete the 
algorithm, we perform a 2D inverse FFT or FHT, which 
requires 0(N 2 log N) operations. Properly speaking, sam¬ 
pling theory demands that we employ 2x oversampling during 
slice extraction. We have found that the penalty of not doing 
so is minor for the datasets we typically encounter. Assuming 
N = 256 and W = 4, we have slice extraction and inverse 
transformation costs of 16Af and 2.25M operations, respec¬ 
tively. Therefore, although the Fourier transformation is 
asymptotically dominant, the slice extraction step usually 
takes more time. 

The data structures required by the algorithm (in addi¬ 
tion to the input volume) include a 3D spectrum measuring N 
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Figure 2: Imperfect reconstruction in the frequency 
domain produces incompletely suppressed replicas in 
the spatial domain. These appear in the image as 
noticeable ghosting. 

coefficients on a side and a 2D slice that also measures N 
coefficients on a side. While the input volume may be ade¬ 
quately represented by 8 bits per voxel, accurate frequency 
domain representations typically require 32-bit floating point 
coefficients, thereby increasing memory requirements by a 
factor of 4. Quantization or coding of Fourier coefficients 
such as specified in the JPEG still picture compression stan¬ 
dard is possible, but reconstruction time must be considered. 

The use of finite support interpolation filters in the 
slice extraction step can give rise to objectionable artifacts in 
Fourier projections. Although these problems are not the 
focus of this paper, they impact performance significantly, so 
a brief discussion of them is warranted. 

Convolution of a sampled function with a filter of finite 
support causes incomplete suppression of periodic replicas in 
the other domain. Usually, convolution is performed in the 
spatial domain and the incompletely suppressed replicas occur 
in the frequency domain. In the Fourier projection algorithm, 
we convolve in the frequency domain, producing incom¬ 
pletely suppressed replicas in the spatial domain that appear 
as noticeable ghosting. The problem is summarized in one 
lower dimension in figure 2. Two solutions are known for 
this problem: 

(1) Employ a better (and typically wider) interpolation 
filter. If we increase the nonzero filter width by a fac¬ 
tor of k |, k x > 1, we increase per-view processing time 
by a factor of ki*. 

(2) Widen the separation between spatial domain replicas 
by zero padding the input volume. If we increase the 
volume width by a factor of k 2> which corresponds to 
surrounding the data on each of its six sides by a mar¬ 
gin of zeros of width N ( k 2 -1)/2, we increase per- 
view processing time by a factor of k 2 2 . We also 
increase memory requirements by a factor of k 2 and 
preprocessing time by a factor of 


k 2 3 (tog *2 + log N) / log N, but these effects are less 
important 

The second solution has theoretically superior per-view per¬ 
formance. The relative impact of these two solutions on 
image quality has not been studied, however. Malzbender 
reports satisfactory results with a 5 3 tap filter and a 10% mar¬ 
gin of zeros on each side of the data [Malzbender]. The 
images in this paper were made using a 16% margin 
(k 2 = 4/3) and a 4* tap filter. 


3. Shading models 

As discussed in the introduction, the class of integrals 
that can be directly evaluated using the Fourier projection- 
slice theorem does not allow the modeling of emission and 
scattering in a participating medium. In this section, we con¬ 
sider several shading models that are linear combinations of 
Fourier projections in the spirit of equation 1.3 and which 
restore some of the lost visual cues. 


3.1. Depth cueing 

The fraction of light lost to absorption or scattering in a 
participating medium is exponentially related to the distance it 
travels. This attenuation provides strong perceptual cues con¬ 
cerning the relative depth of objects in the scene. In this sec¬ 
tion, we develop an approximate shading model in which 
exponential attenuation with locally varying absorption and 
scattering coefficients is replaced by linear attenuation with a 
constant coefficient. It will be shown that images based on 
this model can be rendered efficiently using the projection- 
slice theorem. 

Let us begin with exponential attenuation. We define a 
volume of density p(x,y,z) and color C(x,y,z) to emit light 
with an energy of Cp per unit length and absorb light with an 
opacity of x p per unit length where x is a constant We also 
define a viewing ray P(f) parameterized by length t and two 
points P(a) and P(b) lying on the ray. Ignoring scattering, the 
total energy I P arriving at P(b) due to emission and absorp¬ 
tion along that portion of the ray lying between P(a) and P(b) 
is given by 

* 

b = J P (P(0) C (P(/» exp[- J T p (P(«)) du ] dt. (3.1.1) 

a a 

The notation is adapted from [Max90]. Replacing exponen¬ 
tial attenuation with locally varying coefficients x p by linear 
attenuation with constant coefficient x gives 

b 

b= j p (P(0) c (P(i)) [x/2 + t/2 (D (P(/)) • V)] dt (3.1.2) 

a 

where D = (D xt D y ,D x ) is a linear function of position along 
the ray, i.e. D = m P + b for constants m and b, and 
V = {V„V r V t ) is the normalized viewing direction as shown 
in figure 3. Note that V points away from the eye, not 
towards it as is common in many textbooks. We have chosen 
this convention in order to underscore the similarity between 
equation 3.1.2 and equation 3.2.3 in section 3.2. 

In the introduction, we stated that in order to avoid 
recomputing the 3D frequency domain representation for each 
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Figure 3: Linear depth cueing for a volume centered 
at the origin is given by D = m P + b for constants m 
and b where P(0 is position along a viewing ray and V 
is the normalized viewing direction. 

view, all integrals over the volume must be independent of 
viewing direction. Looking back at the last equation, we 
observe that viewing direction V is independent of ray param¬ 
eter t. We may therefore factor it out of the integral, produc¬ 
ing 

b 

Ip=V,j p (P(0) c (P(0) [T/2 + in (D,(P(l)))] dt 

c 

b 

+ v, I p (P(r» C(P(/)) [V2 + i12 (£>,(P(/)))1 dt 

l (3.1.3) 

+ v,j p <P(o) c(P(/)) [in+in (D,(P(0))1 dt 

a 

b 

+ c, jp(P(/))C(P(/))<*. 

a 

where Cj = x/2 -1/2 (V x +V y +V g ). 

These four integrals are now in a form that can 
evaluated using the Fourier projection-slice theorem. Apply¬ 
ing the projection operator defined in equation 2.1, we obtain 
the following expression for output image / 

I = v x n v [(x/2 +1/2 D x (x,y, z» p (x,y,z) C (*,)>,z)] 

+ V, n v [(V2 + V2D y (x,y,z))p (x,y,z) C (x,y,z)] ^ 

+ v, n v [(t/2 + 1/2 D,(x,y,z)) p (x,y,z) C(x,y,z)] 

+ ci rivlp (x,y,z)C(x,y,z)]. 

The algorithm is shown in figure 4. Four copies of 
p (x,y,z) C(x,y 9 z) are created. Each is multiplied by the 
appropriate depth cueing function and Fourier transformed, 
yielding four 3D spectra. The first three spectra represent 
volumes that are depth cued along one of object space 
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Figure 4: Algorithm for generating a depth cued 
rendering of a volume dataset During 
precomputation, volume p C is multiplied by depth 
operator D. For each view, the extracted slices are 
multiplied by viewing direction V. 

coordinate axes. The fourth spectrum represents the original 
volume. For each view, slices are extracted from each spec¬ 
tra, inverse Fourier transformed, weighted by the appropriate 
component of the current viewing direction as shown in the 
figure and summed, producing an image that is depth cued for 
that viewing direction. 

To summarize, we have replaced per-voxel shading 
followed by projection with projection followed by per-pixel 
shading. This transformation is allowed because integration 
is a linear operator and the current viewing direction is con¬ 
stant for all voxels. Both formulations produce identical 
results, but the latter allows a more efficient implementation 
using Fourier projections. 


32 . Directional shading 

In the previous section, we showed that a per-voxel dot 
product followed by projection could be replaced with projec¬ 
tion followed by a per-pixel dot product There are other 
places where dot products appear in per-voxel shading 
models. Most notable is probably the |N • L| product giving 
the iuadiance on a surface of orientation N produced by a 
parallel light source originating from direction L. Unfor¬ 
tunately, the presence of the nonlinear absolute value operator 
prevents evaluation of this dot product using the Fourier 
projection-slice theorem. In this section, we develop an alter¬ 
native, physically valid, shading model in which illumination 
from a parallel light source is replaced by illumination from a 
hemispherical source. It will be shown that images based on 
this model can be rendered efficiently using the projection- 
slice theorem. 

Let us begin with Lambertian reflection and parallel 
illumination. We define a volume of albedo Q(x t y,z) and an 
orientation field N (x,y,z). The latter is typically estimated 
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Figure 5: Hemispherical illumination of a surface. 
Irradiance is equal to integral of the projection of 
shaded portion of the hemisphere down onto the plane 
containing the surface. 

from the scalar input data using a central difference operator 
[Levoy88]. Ignoring attenuation, the energy I P arriving at 
point P (b) along viewing ray P(/) is given by 

b 

/<■=//, n (P(0) IN (P(0) • LI dt (3.2.1) 

a 

where l, is the intensity of a parallel light source originating 
from direction L. This expression assumes illumination in 
both the L and -L directions. One-side illumination is 
obtained by replacing |N • L| with max(N • L, 0). 

Since the absolute value (or max) operator is nonlinear, 
we cannot factor lighting direction L out of the integral as we 
did with viewing direction V in section 3.1. Omitting these 
operators entirely would cause voxels containing backward- 
facing normals (relative to the light source) to produce nega¬ 
tive light that would combine under integration with positive 
light produced by voxels containing forward-facing normals 
to yield incorrect images. 

To eliminate this problem, we consider an alternative 
illumination method - an emissive hemisphere. The irradi- 
ance E x on a surface having normal vector N illuminated by a 
hemisphere whose pole points in direction L as shown in 
figure 5 is equal by Nusselt’s analogue (as described in 
[Cohen85]) to the projection of the visible portion of the hem¬ 
isphere down onto the plane containing the surface, or 

JL JL 
2 2 

*>-<. I J cos<t> cos 2 vy dy dty 

T + e *f (3.2.2) 

= nil + n/2 cosy 
= n/2 + rc/2 (N • L). 

Assuming Lambertian reflection with albedo ft(x,y,z), we 


Figure 6: Algorithm for generating a directionally 
shaded rendering of a volume dataset. During 
precomputation, volume I, ft is multiplied by 
orientation field N. For each view, the extracted slices 
are multiplied by lighting direction L. 


have 


> = jl. a (P(0) [71/2 +ic/2 (N (P(t» • L)] dt. 


(3.2.3) 


Although seldom seen in the graphics literature (a notable 
exception is [Nishita86]), this shading model is entirely plau¬ 
sible - it models the appearance of diffuse objects outdoors on 
a cloudy day (ignoring interreflection among objects on the 
ground). The directional shading provided by this model is 
less sensitive to surface orientation than shading arising from 
parallel illumination, just as our perception of shape is poorer 
on a cloudy day than on a sunny day. Nevertheless, the model 
is a significant improvement over ignoring scattering effects 
entirely. 

Continuing our derivation, we observe that equation 
3.23 is nearly identical to equation 3.1.2. If we wish to sup¬ 
port rotation of the volume relative to the light source yet 
avoid recomputing the 3D frequency domain representation 
for each view, the integrand must be independent of lighting 
direction. Looking at the equation, we see that hemisphere 
pole direction L is independent of ray parameter t and may be 
factored out of the integral. As in the case of depth cueing, 
this manipulation produces four integrals that can each be 
evaluated using the Fourier projection-slice theorem. Repeat¬ 
ing the steps in section 3.1, we obtain the following expres¬ 
sion for output image / 

/ = L x n v [(n/2 + n/2 A/ X (x,y,z)) /, ft (x,y,z)] 

+ L, n v [(rc/2 + ti/2 N y (x,y,z)) I, fl (x,y,z)] ^ 

+ L, Tlv[(n/2+n/2N,(x,y,z))I,Cl(,x,y,z)] 

+ c 2 n „[/, n ( x,y,z )] 


where N„(x,y,z) is the ^-component of the orientation field at 
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location (x,y,z) similarly for N y and N tf and c 2 = 7i/2- 
n/2 (L x +L,+L,). 

The algorithm is shown in figure 6. Four copies of 
/, Q (x,y,z) are created. Each is multiplied by the appropriate 
component of N (x,y,z) and Fourier transformed, yielding 
four 3D spectra. Because of the method used to compute N, 
the first three spectra essentially represent volumes that have 
been gradient-enhanced in one of the object space coordinate 
directions. The fourth spectra represents the original volume. 
For each change in view or lighting direction, slices are 
extracted from each spectra, inverse Fourier transformed, 
weighted by the appropriate component of the current lighting 
direction as shown in the figure and summed, producing an 
image that is directionally shaded for that lighting direction. 

As in the case of depth cueing, we have replaced per- 
voxel shading followed by projection with projection fol¬ 
lowed by per-pixel shading because one of the terms in a dot 
product - the current lighting direction - is constant for all 
voxels. 


3*3. Depth cueing & directional shading 

If depth cueing and directional shading as defined in 
sections 3.1 and 3.2 can each be expressed as linear combina¬ 
tions of Fourier projections, it should be possible to combine 
them, producing a composite shading model that exhibits both 
effects and which is expressible as a linear combination of 
Fourier projections. 

Multiplying the integrand in equation 3.2.3 by the 
linear attenuation factor in equation 3.1.2, we obtain for a 
viewing ray 

b 

1 P = 11, Cl (P(0) [ji/ 2 + n/2 (N (P(t) - L)] 

a (d.J.IJ 

x [t/2 + t/2(D(P(0)* \)]dt. 

After we factor and apply the Fourier projection operator, we 
obtain for the output image 

7 = 1 7t/2 V L • n v [(l/2 +1/2 D n (x,y,z)) I, D (x.y.z)] 

(3.3.2) 

+ xn/2 c 3 T1 V [I S D (x,y,z)] 
for c 3 = 1/2 - 1/2 X V L . where 

i 1 

V L = {VJL xt VA, VyL„ V y L yt V y L tt 

V t L x% VA, V t L,y x , V y9 L x , Ly, Lj) 
and 

D n = (DA. DA. DA. DA, DA. 

DA. DA. da. D x , D r D x% N x> N y , N g ). 

This requires the extraction and inverse Fourier transforma¬ 
tion of 16 slices from 16 precomputed spectra. While this 
algorithm is still asymptotically less expensive than conven¬ 
tional volume rendering, its high constant makes it slower 
except for very large volumes. We note that our composite 
shading model ignores emission p C in equation 3.1.2. In the 
context of a directional shading model, this term can be 
thought of as representing ambient illumination. If included. 


it adds another 3 terms to equation 3.3.2 and hence another 3 
slices that must be extracted. 


3.4* Specular reflections and other extensions 

Shiny surfaces exhibit a greater change in reflected 
light intensity for small variations in surface orientation than 
dull surfaces. A common formulation of the specular term in 
Phong’s illumination model [Phong75] is | N * H | * for some 
exponent rt where H is the vector halfway between the direc¬ 
tions of the light source, L, and the viewer, V. Using hemis¬ 
pherical illumination to avoid the absolute value operator, we 
have for a viewing ray 

b 

Ip =\ I, Cl (P(0) [n/2 + nil (N (P(t) • H)*] dt. (3.4.1) 

a 

After we factor and apply the Fourier projection operator, we 
obtain for the output image 

/ = H n • n„[(n/2 + n/2 N n (x,*z)) I, n (x,y,z)] 

(3.4.2) 

+ c 4 n v [/, Q (x,y,z)] 

for suitable constant c 4 where H n and N n contain the list of 
factors in the polynomial (N • H)\ The number of terms and 
hence the number of spectra is given by the multinomial 
theorem [Knuth73]. For n = 10, a typical value, we require 
286 precomputed spectra, making this extension slower than 
conventional volume rendering even for very large volumes. 

For scenes of high depth complexity, linear attenuation 
may provide an insufficiently steep intensity falloff to disam¬ 
biguate depth relationships. Equation 3.4.1 can be adapted to 
provide n-degree polynomial depth cueing by replacing N and 
H with D and V, respectively, and adjusting the constants. 
For example, inverse square law falloff would be given by 
(D • V) 2 and would require 10 slices from 10 precomputed 
spectra. 


4* Implementation and results 

Figures 7 through 13 each show the application of a 
different shading model to a 96 3 voxel cube extracted from a 
CT scan of a human skull mounted in a lucite head cast. The 
right eye orbit and a portion of the zygoma (temple) are visi¬ 
ble at center and left, respectively, and the lucite nose is visi¬ 
ble at lower right To facilitate comparisons, figure 7 is a con¬ 
ventional volume rendering generated using the techniques 
described in [Levoy88]. 

Figures 8 through 13 were generated using the Fourier 
projection-slice theorem as follows. The input data was first 
edge enhanced and nonlinear windowed (except for figure 8 
as noted below), then surrounded with a 16% margin of zeros 
to avoid the ghosting problem discussed in section 2. The 
resulting 128 3 voxel cube was weighted according to equa¬ 
tions 3.1.4, 3.2.4, or 3.3.2, and Fourier transformed using a 
3D FFT algorithm. For each view, slices of width equal to the 
spectra were extracted using a tricubic interpolating spline, 
inverse transformed using a 2D FFT algorithm, weighted 
appropriately for the current viewing or lighting directions, 
and summed to produce a 128 2 image. 
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Memory requirements varied from 48 megabytes for 
equation 3.1.4 to 256 megabytes for equation 3.3.2. Per-view 
rendering time varied from 6 seconds for equation 3.1.4 
(about 1.5 seconds per extracted slice) to over a minute for 
equation 3.3.2 (due mostly to paging). Timings are for a 36 
MHz Silicon Graphics 4D/320 VGX and a sequential C 
implementation. No attempt was made to optimize the code. 

By replacing the FFT with an FHT, one can immedi¬ 
ately halve the per-view rendering time and the memory 
requirements. By optimizing the filter and trimming the zero 
padding, another factor of two speedup and some reduction in 
memory usage can be expected. For greater speedups, the 
algorithm can be parallelized at either a fine grain (within the 
FFT) or a coarse grain (between slices). It is also amenable to 
hardware acceleration using video coprocessors or video digi¬ 
tal signal processors (DSP). These devices frequently operate 
on fixed-point representations of frequency spectra, suggest¬ 
ing additional opportunities for reducing memory costs. 

Figure 8 shows an attenuation-only projection (i.e. an 
X-ray) generated using a variant of equation 1.2. The lack of 
occlusion and shape-from-shading cues is evident. Figure 9 
shows an attenuation-only projection of the input volume after 
edge sharpening and nonlinear windowing of the density 
range. Figures 10 through 13 also include sharpening and 
windowing. Figure 10 shows a depth-cued X-ray generated 
using the shading model described in section 3.1. The 
weighting assigned to each point in the volume falls off 
linearly with increasing distance from the viewer. Figure 11 
shows a directionally shaded X-ray generated using the shad¬ 
ing model described in section 3.2. Each point in the volume 
is weighted as if it sat on a diffusely reflecting surface 
illuminated by a hemisphere placed on the right side of the 
picture and shining leftwards. Figures 12 and 13 show two 
views that each include both depth cueing and directional 
shading using the model described in section 33. Although 
no occlusion occurs in either of these figures, the shading 
effects enhance our ability to perceive shape and discern spa¬ 
tial relationships. When displayed in motion, the visual 
impact is sufficiently strong that viewers often overlook the 
fact that these are X-rays, not opaque renderings. 


5. Conclusions and future work 

We have described a family of algorithms for generat¬ 
ing realistically shaded renderings of volume data using the 
Fourier projection-slice theorem. While the images produced 
using these algorithms do not exhibit occlusion, they provide 
sufficient depth and shape information to make them 
moderately understandable in static views and very under¬ 
standable in motion sequences. In either case, they produce 
images richer in visual cues than the attenuation-only projec¬ 
tions with which the projection-slice theorem is usually asso¬ 
ciated. 

One aspect of the Fourier approach that we have not 
discussed is how it impacts interactive data segmentation. 
The edge sharpening and nonlinear windowing operators used 
in figures 9 through 13 are applied before the forward 3D 
Fourier transform. As a consequence, changing them requires 
recomputing the spectra. Depending on the application, it 
may be undesirable to hardwire data segmentation in this way. 


Several strategies are available for avoiding this 
difficulty. If the data segmentation operator is expressible as 
a linear combination of basis functions, it can be handled 
using the same scheme we have employed for shading 
models. For example, if a CT dataset is segmented into fat, 
muscle, and bone volumes during preprocessing, yielding 
three spectra, the weights and colors assigned to each material 
can be altered at image generation time by applying different 
weights to the slices extracted from each spectrum. 

If the segmentation operator is expressible as a convo¬ 
lution, it may be possible to implement it equally or more 
efficiently as multiplication in the frequency domain. For 
example, the edge sharpening operator used in figures 9 
through 13 could have been implemented as multiplication of 
each extracted slice by a high-frequency emphasis function. 
Using this approach, the amount and type of enhancement can 
be specified at image generation time. 

Conversely, if the segmentation operator is expressible 
as multiplication by a function of position in the spatial 
domain, it may be possible to implement it as convolution in 
the frequency domain. For an intriguing example of this stra¬ 
tegy, consider the linear depth cueing described in section 3.1. 
Although the Fourier transform of this function is infinite in 
extent, equally effective depth cueing can be obtained using 
the first half period of a suitably oriented cosine function. 
The Fourier transform of a cosine is two delta functions. For 
each view, we determine the placement of these deltas, 
preconvolve them with our interpolation filter (which amounts 
to summing two shifted copies of the filter function), and use 
the composite filter during slice extraction. This technique 
requires only one spectrum as input, making it potentially 
more efficient than the technique described in this paper. We 
are currently investigating this approach. 

Looking beyond shading models, sums of Fourier pro¬ 
jections can also be used to compute integrals over irregular 
spatial domains. Specifically, the integral of any function 
over the ray segment connecting two points in the interior of a 
volume can be computed as a sum of integrals over spatially 
disjoint intervals of the ray that taken together span die seg¬ 
ment Assuming well behaved functions, each integral can be 
computed using the projection-slice theorem. We are 
currently investigating an algorithm based on recursive subdi¬ 
vision of volumes into octants that can compute images in 
O (N 2 log 2 N) time. Possible applications include rendering 
of spatially clipped volumes for medical visualization, fast 
calculation of inter-element attenuation for zonal radiosity 
algorithms, and approximate visibility determination for 
geometrically defined scenes. 
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Figure 7: Conventional volume rendering of a CT 
scan of a human skull mounted in a lucite head cast. 


Graphics Interface ’92 




69 



Figure 8: Attenuation-only projection (i.e. X-ray). 
All figures on this page were generated using the 
Fourier projection-slice theorem. 



Figure 9: Attenuation-only projection after edge 
sharpening and nonlinear windowing. Figures 10-13 
also include these enhancements. 




Figure 10: Depth-cued X-ray generated using the 
shading model described in section 3.1. 


Figure 11: Directionally shaded X-ray generated 
using the shading model described in section 3.2. 




Figure 12: Depth-cued and directionally shaded X- Figure 13: Rotated view of depth-cued and 

ray generated using the shading model described in directionally shaded X-ray. 

section 3.3. 
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An Extended Cuberille Model for Identification and Display of 
3D Objects From 3D Gray Value Data 
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Abstract Edge elements have little geometrical information be¬ 

cause their shape and size are undefined. As a result, they 
This paper presents an extended cuberille model for the cannot be used to construct an object. What is needed are 
identification, reconstruction, and display of 3D objects modeling primitives whose shape and size are defined and 
from 3D gray value data. 3D edge elements are gradients whose orientations are close to their edge counterparts, 
detected at voxels, and the orientations of gradients are |.e., a model. Currently available models are: cuberille 
quantized to 26 directions. The edge elements are then [HL79] and a polyhedral model [LC87]. Both models use 
converted to the extended cuberille model. The model thresholding to identify objects. 

lias four volume primitives. Besides a cube, voxels are ex- j n the cuberille model, an object is frequently repre¬ 
tended to include three other polyhedra so that voxel faces sented as a collection of cube shaped voxels. Volume ren- 
are compatible with 26 gradient orientations. The mer- dering algorithms can be seen in [Rey87]. An object can be 
its of the three representation schemes: space occupancy represented by voxel faces. Algorithms tracking a surface 
enumeration, octree, and surface representation by the ex- Q f a binary image have been seen in [AFH81] and [GU89]. 
tended cuberille model are briefly discussed. To identify Because of the regularity of voxels, an object can also be 
border voxels, asymmetric Gaussian filters are applied to organized as an octree. Octree related algorithms include 
compute second derivative at each voxel. Conditions are t ree generation algorithms [Sam80, YS83], set operation 
defied for identifying border voxels based on the sign of algorithms [HS79], geometric transformation algorithms 
the second derivative. From these conditions, there exists [JT80, Mea82), and display algorithms [ZD91]. 
exactly one layer of border voxels, and subsequent surface j n the polyhedral model [LC87], a cube is made up of 
tracking is therefore straightforward. Experimental results e igbt voxels in eight vertices, that are either inside or out- 
of 3D surface identification, tracking, and display by the s | ( j e determined by thresholding. The algorithm marches 
extended cuberille model on test data and medical data cube by cu b e , creating triangle faces to model a piece of 
are given. Because there are only four types of external sur f ace within each cube. A surface made of triangle faces 
voxel faces in the model, a surface of any object consists can be displayed using a graphics package, 
of only the four types of external voxel faces. Some results use 3D edge detection to identify a sur¬ 

face. The 3D boundary following algorithm by [CR89] 
1 Introduction constructs a 3D boundary by stacking 2D boundaries, that 

are extracted using a graph search algorithm [Mar76]. 

This paper presents an extended cuberille model for iden- Ik the following, the motivation to extend the cuberille 
tification, reconstruction and display of 3D objects from model for 3D edge detection and surface construction is 
3D gray value data. In 3D identification, thresholding is discussed. 

widely used but restricted. In a variety of applications. Suppose the 3D edge operator [MR81] is applied to gray 
not many objects can be identified by simple threshold- value data. It detects edges by computing the gradient 
ing. Identification based on 3D edge detection is a. more at each voxel. The gradient at a voxel can be written 

general method. Some 3D edge operators have been pro- as a vector V = (V x , V y , V-) with the three compo- 

posed to detect edge elements [MR81] using gradients. nents indicating intensity changes along three principal 
The problem of how to group detected edge elements axes. The magnitude of the gradient, approximated by 
to reconstruct an integral object has not been discussed. |V| = [V*] -f |V y | + |V 2 |, indicates the possibility that 

An integral object representation is a relatively new topic the voxel is an edge element. The larger the magnitude, 

[Man88]. It implies that if an object is represented bv its the more likely that it is an edge voxel. The direction of 
surface, the surface must be dosed and without missing the gradient, vector determines the edge orientation. For 
faces. If the object is represented bv a collection of voxels, simplicity, the direction of a gradient is quantized to one 
each voxel is a solid with a t hickness so t hat the space of 26 directions (see Fig 1. ) 

occupied by the object can be measured. Suppose an edge is detected at a voxel with orientation 
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Figure 1: Three of the 26 gradient directions 

(1,1,1), as shown ill Fig 2(a). The cuberille model would 
represent the edge by three voxel faces whose average nor¬ 
mal is (1,1,1), as shown in Fig 2(b). But it is natural 
to represent the edge by a face whose normal coincides 
with the edge orientation. For example, the triangle face 
in Fig 2(c) would construct a smoother surface than the 
three voxel faces. Of course, any other face with the same 
normal could also be chosen as a face primitive. So the 
problem is to choose a set of modeling primitives whose 
normals reflect, the 26 gradient directions and therefore 
result in a smoother surface. To solve the problem the 
Extended Cuberille model is developed in the next, sec¬ 
tion. 


y 



(a)Gradient vector (b)The cuberille model (c)Thc extended cuberille 

model 

Figure 2: Different modeling primitives 


2 The Extended Cuberille Model 

To model voxels, the modeling volume primitives are ex¬ 
tended to include three other poly lied ra (see Fig 3) so that 
the voxel faces are compatible with the 26 gradient orien¬ 
tations. 

0 P 4 E3 

(a)TypeJ (b)Type 2 <c)Type_3 (d)Typc 4 

Figure 3: The set of volume primitives 

As shown in Fig 3, cutting a cube voxel through diago¬ 
nals of top and bottom faces gives the second polyhedron 
with a face orientation of (1,0,1). Chitting a cube voxel 
through three vertices defines the third and the fourth 
polyhedra, and gives a face oriented at ( — 1,1.1). Since 
the last three polyhedra are not symmetric, it is assumed 


that a volume primitive is invariant under 90 degree rota¬ 
tions about any principal axis. Hence the faces of the set 
of volume primitives give 26 orientations. Furthermore, 
to be able to represent objects hierarchically, it is also as¬ 
sumed that a primitive scaled by a power of two is also 
the same. The formal definition is given in the following. 

Let V be the set of four volume primitives shown in 
Fig 3, i.e., V={typeA, type-2, type. 3, typeA). Let T 
denote the transformation of 90 degree rotations about 
the principal axes or power of two scalings. Thus T is an 
equivalent relation on V and each type-k,k = 1,2,3,4, is 
an equivalence class by T. Let I be the 3D coordinate set, 

1 C Z 3 , Z is the set of integers, and / is a map, / : / —* V, 
then 

Definition 1 A voxel Vi is a pair of (i,f(i)), where /(t) 
is a volume primitive, i.e., f(i) € V, of minimum scale and 
i are the coordinates, i el. 

Arguably, a cube could be cut other ways, as shown in 
Fig 4, that could also result in 26 face orientations. But 
the set V” is better than other choices for it is the smallest 
set that is closed under the subdivision operation. 

9 4 Q Dfl 

Figure 4: Another set of volume primitives 

Theorem 1 The set V is closed under the subdivision op¬ 
eration , and it has the least cardinality among those having 
the same property. 

Proof: It follows directly from subdiving the four vol¬ 
ume primitives, as shown in Fig 5(a) to (d), that the V 
set is closed under subdivision. It remains to be shown 
that it has the least cardinality. Let U be any set of vol¬ 
ume primitives whose type-3 and type-4 polyhedra are 
obtained by cutting a cube voxel, but not passing through 
face diagonals. Fig 5(e) shows different cuttings to get a 
face normal (1,1,1). Assume the edge of a face is cut at a 
ratio «/(l - u), as shown in Fig 5(f), then subdivision of 
the cube results in a new polyhedra with edge cut ratio of 
#i/(0.5 — «). If U is closed, it must include at least two sets 
of polyhedra from the two ways of cutting to give the same 
face orientation. The same argument, holds for a type_2 
polyhedron. It follows \U\ > \V\. 

Since the set V is closed under subdivision, it is also 
possible to represent objects by octrees. 

The mathematical definition of an object in the ex¬ 
tended cuberille model is given in the following: 

Definition 2 .4»» object S is a regularized union of vox¬ 
els. S = where U* denotes the regularized union 

operation. 

The regularized set operations, see [Man88], defines A U* 
B = c(HMJB)), where c(AU B) and i(AuB) denote the 
closure and interior of A U B. 
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OOType.l (b)Type_2 (c)Typc_3 (d)Typc.4 



(c) Different cutting to get (0 Subdividing introduces 

a face with normal (1,1,1). a new polyhedron. 


Figure 5: The set V is closed under subdivision. 

The definition gives the constructive process: because 
voxels can only touch in faces, edges or vertices, it implies 
v, fl* Vj = <j> for t ^ j. Thus U* t>, removes all internal voxel 
faces to make a solid object with one interior enclosed by 
a surface. 

There are also several ways to represent objects in the 
extended cuberille model, in the next Section, the merits 
of three representation schemes, the enumeration scheme, 
octrees, and surface representation, are briefly discussed. 

3 Merits of Representation Schemes 

The spatial enumeration scheme in the extended cuberille 
model lists all voxels whose spaces are either fully or par¬ 
tially occupied by ail object. The interior of an object is 
filled by type.l voxels because they are fully occupied. All 
type_2 to type.4 voxels represent partially occupied space 
and are therefore on the border of the object. Type-1 
voxels could also be on the border if the surface passes 
through at least one of its faces. For example, Fig 6(a) to 
(c) show an mathematically defined object, its space enu¬ 
meration representation in the extended cuberille model, 
and its space occupied in the cuberille model. Although 
the representation gives a smoother surface, it is not. as 
storage efficient as its counterpart in the cuberille model, 
for it needs two arguments - coordinates, type or gradient, 
direction - to list a voxel. 



(a) Defined object. (b) Space enumeration (c) Space enumeration 
in the extended cuberille in the cuberille model, 
model. 


An octree representing the object in the previous example 
is shown in Fig 7(d). Compared with the octree in the cu¬ 
berille model (Fig 7(c)), the octree in the extended model 
is more concise because it includes leaf nodes to represent 
certain partially occupied space. Now consider two ex¬ 
treme cases shown in Fig 8. For the object in Fig 8(a), 
the octrees for the two models would be the same. For the 
object in Fig 8(b), the subdivision around the border in 
the cuberille model would reach the voxel level, whereas 
there is no subdivision in the extended model, because 
the root node is a leaf node. For the object with a surface 
slope as shown in Fig 8(c), the subdivision in the extended 
mode would, in the worst case, reach the voxel level. The 
following conclusion results: 

Theorem 2 To represent an object, the size oj an octree 
in the extended cuberille model is at most the size of an 
octree in the cuberille model . 


y 



z 

(a) Space subdivision. (b) The subdivided object. 



0123456701234567 01 234 567 

(c) The Octree in die (d) The octree in the 


cuberill model. extented cuberille model. 

Figure 7: An octree represented object in the ex¬ 
tended cuberille model. 



(a) (b) (c) 

Figure 8: Comparing tree size in two models. 


A surface representation lists all voxel faces on the sur¬ 
face. Since any object is made up of four types of voxels, 
and by examing Fig 3, there are only four types of voxel 
faces, as shown in Fig 9. It therefore follows: 


Figure 6: A mathematically defined object and its 
representation. 


i iieorcm o I he Jour types oj external voxel faces 
in Fig 9 can close the surface of any object. 


Since there are only four types of voxel faces in the ex- 
An octree in the extended cuberille model may have tended cuberille model, it is expected that the implemen- 
^^types of leaf nodes. In addition to the cube shaped tatiou of a surface representation is not very complicated 
and white leaf nodes, lypc.2. type.3 and type.4 leaf On ll.e other hand, the surface of an object may not al- 
oresent partially occupied space and arc all black, ways be very smooth. 
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(a) An object defined by thresholding. (b) The object border voxels 

defined by conditions (3). 



(c) Converted to the extended (d) The embedded graph. (e) Surface closure, 

cubrille model. 

Figure 13: An example to show the surface tracking 
steps. 

Surf ace_.tr acking(obj ect) 
char *object; 

{ 

1 reacL data(object); 

2 edge_detector(); 

3 zero_crossing(); 

4 init_lists(); 

5 nake_table(); 

6 Border_face_tracking(); 

7 Close_surf aceO ; 

> 

Figure 14: The surface tracking algorithm 

Before tracking the border voxel faces, from lines 1 to 
line 3, the procedure readjdataO reads the data named 
object to an 3D unsigned character array scene. The 
procedure edge-detector0 applies a 3D edge operator 
to the data array, resulting in an 3D unsigned charac¬ 
ter array grad_loc_dir of Iocation/directioii codes, and a 
3D short integer array mag of gradient magnitudes. The 
loc.dir code of a voxel reflects the face normal that is 
necessary information for display. The gradient magni¬ 
tude is used as a rough threshold to assist in border voxel 
identification. The procedure zero-crossing() computes 
u»(ar,j/, 2 ) at every voxel £). resulting in a 3D short 

integer array w. Implementations of these procedures are 
straightforward, hence no details will be given here. 

In lines 4 to 5, procedures init_lists() and 
make.table() initialize all associated data structures, 
such as queue, lists, tables, etc., used by the 
Border_f ace-t racking () and Close_surf ace() proce¬ 
dures. 

As shown in Fig 13(d), if border voxels and the ad¬ 
jacency relations between voxels can be modeled as a 
graph, in which nodes are border voxels and edges are 
adjacency relations between pairs of voxels, a standard 
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breadth-first search can be used for tracking border vox¬ 
els. The Border-face_tracking() procedure is outlined 
in Fig 15. 

int x,y,z; 

Q_CELL * curr en t _ v ox e 1; 

Border_f ac e.tracking() 

{ 

unsigned char index; 

1 get.start.face; 

2 span.start_face(); 

3 while(current_voxel=de_queue_front() { 

4 x=current_voxel->x; 

5 y=current_voxel->y; 

6 z=current_voxel->z; 

7 index=grad_loc_dir[x][y][z] ft “HARK; 

8 Neighbor.face(index); 

9 add_to_display_table(x,y,z,index); } 

> 

Figure 15: The border face tracking procedure. 

The underlying data structure is a queue, where each 
queue cell has three integers, x,y,z, for recording the co¬ 
ordinates of a border voxel. 

In the procedure, the micro get .start-face in line 1 
reads the coordinates of a start, border voxel, and the pro¬ 
cedure span_start-face() in line 2 searches its adjacent 
border voxels, marks them and queues them. 

The while loop in line 3 starts tracking border voxels. 
A queue cell is obtained from the queue as the current 
voxel. In line 8, procedure Neighbor_face() is called to 
search for the current voxel’s adjacent border voxels. It 
scans all adjacent voxels, testing if any satisfies the condi¬ 
tion (3). Meanwhile if a border voxel is unmarked, mark 
it and queue it. After the procedure Neighbor_faceO 
returns, add-to_display_table() is called to add the cur¬ 
rent voxel coordinates, x y z, and its loc-dir code, index, 
to a structure array to display the voxel face later. While 
the queue is not empty, the loop continues to the next 
cell in the queue. The while loop stops once the queue is 
empty. 

The time complexity of Border_face-tracking() de¬ 
pends on the while loop and the procedure call 
Neighbor-f ace(). It is analyzed below. The data to start 
tracking are three 3D arrays, grad-loc.dir, grad_mag, and 
w. Bit seven of grad-loc.dir is designated for mark¬ 
ing, hence checking and marking a voxel visited can be 
done in constant time. As a result, the time to execute 
Neighbor-face() depends on the number of adjacent vox¬ 
els that, a border voxel could have. Since each unmarked 
border voxel is placed in the queue once, the while loop 
is executed only once for every border voxel. Denote the 
number of adjacent voxels that a border voxel has as k , and 
the total number of the border voxels as ns, the time com- 
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plexity of BorderJace.trackingO is therefore O(kns)- 

For a given object, ub is determined by the condi¬ 
tions (3) and is fixed. But k varies with the number of 
adjacent voxels that a border voxel could have. A way to 
define the adjacency relation between pair of voxels is by 
digital topology, where two voxels are adjacent each other 
if they are either face, edge or vertex connected. By this 
definition, each voxel has 26 adjacent voxels. Hence the 
constant k is about 26. 

Observe from Fig 13(c), however, that in the extended 
cuberille model, every border voxel can be converted to 
a face primitive, and all the face primitives are on a sur¬ 
face. Intuitively, a face normal shouldn’t have dramatic 
change from one border voxel to an adjacent, one because 
the surface is supposed to be smooth. This suggests that 
the face normal of a border voxel can be used to assist 
in defining adjacent voxels. This results in less than 26 
adjacent voxels. Furthermore, face primitives are either 
square or triangular face, see Fig 3. hence n fare primitive 
has at most four edges. As a result, there art* at most 
four ways to connect the current face to the next face. 
The neighbor connections are called outways of the cur¬ 
rent voxel. In the implementation, the Neighborjface() 
procedure searches outways instead of all adjacent voxels. 
Whenever a border voxel of an outway is found, the search 
breaks and proceeds to the next outway. This reduces the 
constant k to about half, and speeds up border face track¬ 
ing. No further details will be given. The interested reader 
may refer to [Qu92]. 

For each adjacent border voxel found for an outway, 
the Close_surface() procedure checks face connections 
between the current face primitive and the adjacent face 
primitive. If they are connected, do nothing. Other¬ 
wise the procedure tries to find missing voxel faces and 
adds the missing faces to the display table. Since pro¬ 
cedure Close_surfaceO executes only for disconnected 
border voxel faces, it is expected to be faster than 
Border _f ace.tr ac king (). 

Back to the Surface-tracking algorithm in Fig 14, 
since the first three procedures work on the entire data 
volume, the algorithm is a volume based algorithm after 
all. Once voxel faces of a surface are saved in the display 
table, however, the time of all subsequent operations such 
as display, rotation and scaling, etc., operate on the bor¬ 
der voxel faces, and therefore is an order of the number of 
border voxel faces of an object, i.e.. 0(////). 


7 Experimental Results 

Fig 16 shows surfaces of a test object and a medical ob¬ 
ject obtained from real data. The test object is a sphere 
of volume 40 x 40 x 40 with 16 gray values. Since the 
test object is darker than the background, its border is a 
positive layer of voxels. The surface of the sphere has 509 
border voxels and 797 voxel faces. It can be seen that t he 
surfaces consist of four types of voxel faces. The images is 
displayed using a graphics package WIN DUB [C; B87) on 
a Sun3/60. 




Figure 16: Surfaces of a test object and a medical 
object from real data. 


The data size of the medical object is of 208 x 208 x 26, 
resulting from linearly interpolating 6 CT slices produc¬ 
ing 5 between successive pairs. The gray values were also 
linerly mapped to the range from 0 to 255. The object 
has a brighter color than the background, therefore, its 
border is a negative layer of voxels. There are 17,742 
border voxels detected and 26,409 voxel faces on the sur¬ 
face. Running gprof shows that the execution time for 
Border_f ace-tracking on a Snn4 is about 55.32 seconds, 
and checking face connections and close the surface takes 
33.26 seconds. 

The image is displayed on a Silicon Graphics Iris station 
4 D/35. To display 26,409 faces takes only seconds. 
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8 Conclusion 

The extended cuberille model introduced in this paper 
provides a way to identify, reconstruct and display 3D ob¬ 
jects based on 3D edge detection rather than thresholding. 

3D edge elements are gradients, and orientations of gra¬ 
dients are quantized to 26 directions. The model has four 
volume primitives. Besides a cube, voxels are extended to 
include three other poly lied ra so that voxel faces are com¬ 
patible with 26 gradient orientations. The merits of the 
three representation schemes: space occupancy enumera¬ 
tion, octree, and surface representation by the extended 
cuberille model are briefly discussed. 

To identify border voxels, asymmetric Gaussian filters 
are convolved with the gray value data to compute second 
derivatives of intensity changes at every voxel. Conditions 
for identifying border voxels based on the signs of the sec¬ 
ond derivatives are given in inequalities (3). From these 
conditions, there exists exactly one layer of border vox¬ 
els, and subsequent surface tracking could be simply be a 
breadth first search. 

A surface tracking algorithm using the conditions to 
identify border voxel is given. The surface tracking algo¬ 
rithm has two tasks: traverse border voxel faces and close 
a surface. Analysis and experimental results have shown 
that the the time complexity of the surface tracking algo¬ 
rithm depends on the border face tracking, ami is an order 
of number of border voxels of a tracked object. 'Tracking 
outways instead of 26 adjacent voxels is further suggested 
to speed up border voxel tracking. 

Experimental results of 3D surface identification, track¬ 
ing, and display by the extended cuberille model on a test 
object and a medical object from real data are given. Be¬ 
cause there are only four types of voxel faces in the model, 
a surface of any object consists of only four types of voxel 
faces. 
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Abstract 

We describe an experimental, knowledge-based, virtual-world 
system that uses a monocular “see-through'’ head-mounted 
display to overlay graphics on the user's view of the real 
world. In a simple equipment maintenance domain that we 
have developed, the overlaid graphics include 3D represen¬ 
tations of actual physical objects, textual annotations and cal¬ 
louts, and virtual metaobjects, such as arrows. A set of 3D 
position and orientation sensors monitor the user's head and 
several of the objects in the world. Our system includes a 
knowledge-based graphics component that determines what 
information to present, a display-list—based display server that 
represents the models to be displayed, and a set of sensor 
servers that track the user and selected objects. The sensor 
servers directly modify the object transformations and view¬ 
ing specification in the display list. The knowledge-based 
graphics component also receives the sensor data and uses it 
to redesign the information being presented. 

R6sum6 

Nous d6crivons un sysfeme graphique experimental a base de 
connaissances, qui enrichit la vision du monde perdue par 
l'utilisateur en super-imposant des graphiques g£n£r£s par le 
systfcme. L'utilisateur porte une lentille monoculaire semi- 
reflective sur laquelle s'affichent les images que notre 
systeme decide de montrer. Nous avons d6velopp6 le systfcme 
dans le domaine de la maintenance d'&juipement. Dans ce 
domaine assez simple, les graphiques superimposes compor- 
tent des representations 3D des objets reels, des annotations 
textuelles, et des “m6ta-objet$*' virtuels (par example, des 
fleches). Un ensemble de detecteurs 3D determinent 
1’orientation et la position de la tete de l'utilisateur et de 
certains objets dans l'environnement Notre systfcme com- 
prend trois composants principaux: un systeme de generation 
de graphiques a base de connaissances determine non seule- 
ment quelles informations doivent etre presentees mais aussi 
comment les presenter; un gestionnaire d*6cran k base de liste 
d'objets represente les modules des objets k afficher, et un 
ensemble de serveurs de detecteurs qui suivent la tete de 
l’utilisateur et certains objets. Les serveurs de detecteurs 
modifient la position des objets et les specifications de la 


sc^ne directement dans la liste des objets. Le composant 
graphique a base de connaissances re 9 oit aussi les valeurs 
retoumees par les detecteurs et utilise cette information pour 
reviser l'information k presenter. 

Keywords: knowledge-based graphics, virtual worlds, head- 
mounted displays, heads-up displays, augmented reality 

1 Introduction 

Virtual worlds that use 3D displays and interaction devices 
have the potential to make possible greatly improved explana¬ 
tions of complex physical tasks. These technologies promise 
to allow a user to view and manipulate information in ways 
that better exploit our ability to interact with 3D spatial infor¬ 
mation than does the use of flat 2D displays and input 
devices. As the richness and variety of the information that a 
system can present to a user increases, however, so does the 
difficulty of designing the presentation. Desktop publishing 
systems require more skill and time to master than do simple 
word processors; multimedia presentation editors demand yet 
more design expertise to use effectively. Perhaps the ultimate 
design demands are posed by virtual worlds, which can re¬ 
quire the coordinated design of material that affects all sen¬ 
sory modalities, and that must respond continuously to the 
user's interactions. 

Over the past years, we have been developing knowledge- 
based systems that address the automated design of presen¬ 
tations that explain how to perform simple 3D tasks. These 
systems generate static and animated graphics [6,22,14], and 
multimedia presentations [8] that satisfy a high-level expres¬ 
sion of the information to be communicated. Here, we 
describe some first steps that we have taken toward building a 
testbed system for exploring the automated design of virtual 
worlds to explain maintenance and repair tasks. In this work, 
we are concentrating on the knowledge-based generation of 
graphics that overlay the user's view of the physical world, 
and which dynamically take into account information about 
the user, task, and changes in the surrounding world. 

Ivan Sutherland, in his pioneering research on head-mounted 
displays, developed a binocular “see-through” system [23], 
Each eye viewed a miniature vector CRT, whose synthesized 
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Figure 1: Prototype see-through head-mounted display in use in testbed laser printer maintenance application. 


graphics were merged with the user’s view of the real world 
by means of a beam splitter. Since then, other graphics 
researchers have explored the use of see-through displays, 
both as part of head-mounted displays (e.g., [11, 4]), and 
desk-mounted displays (e.g., [15,21]). We have assembled a 
relatively inexpensive see-through head-mounted display, 
using a Reflection Technology Private Eye [20], a Logitech 
3D position and orientation sensor [18], and a mirror beam 
splitter. This system, shown in Figure 1 in use in an ex¬ 
perimental maintenance application described below, is a 
research prototype: it produces a dim, bilevel, monochrome 
image, and overlays graphics on a relatively narrow portion of 
the user’s visual field, providing approximately a 22° horizon¬ 
tal field of view. Nevertheless, it has proven to be a useful 
research tool. 

In the remainder of this paper, we describe the knowledge- 
based graphics component that designs what the user sees, a 
simple maintenance application with which we are ex¬ 
perimenting, and the design and implementation of our over¬ 
all system architecture. 

2 Knowledge-Based Graphics Component 
2.1 IBIS Overview 

The knowledge-based graphics component that we use is 
based on IBIS (Intent-Based Illustration System) [22]. IBIS is 
a rule-based system that designs illustrations , a term that we 
use to refer to pictures that are designed to satisfy an input 


communicative intent The communicative intent is specified 
by a prioritized list of communicative goals . Each com¬ 
municative goal specifies something that the picture is to 
accomplish; for example, to show an object’s location or 
shape. IBIS distinguishes between design and stylistic choice. 
The design of an illustration corresponds to its high-level 
structure, specifying those visual effects that must be ac¬ 
complished together to satisfy the illustration’s communica¬ 
tive goals; the styles used in an illustration represent the 
different ways each visual effect may be accomplished. 

IBIS uses two kinds of rules to design an illustration: 
methods and evaluators. A method specifies how to ac¬ 
complish a particular style or design; an evaluator determines 
whether a particular style or design has been accomplished. 
IBIS’s rules allow it to examine partial solutions and to back¬ 
track when conflicts occur. 

The illustrations that IBIS generates are dynamic, rather than 
static: IBIS can continuously receive and handle changing 
constraints and goals, while at the same time realizing a 
particular intent For example, IBIS normally determines a 
viewing specification of its own when designing an illustra¬ 
tion. If the viewing specification is provided externally, 
however, then IBIS attempts to use it in generating a picture 
that satisfies the goals, and will incrementally redesign the 
picture if the viewing specification changes. This allows 
IBIS’s user to navigate within the illustrated world by chang¬ 
ing the viewing specification. IBIS’s navigation facility is 
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different from that used in traditional “synthetic camera*’ 
graphics in that the binding between the input intent and the 
output illustration is preserved. IBIS continuously examines 
the illustration as it is altered to attempt to satisfy the 
illustration’s communicative goals. As the user changes the 
viewing specification, conflicts may occur, causing IBIS to 
adopt a new design or set of stylistic choices. Thus, the 
illustrated world itself may change as the user explores it. (A 
rudimentary precursor of this capability was introduced in 
early vector systems that made traversal of an object contin¬ 
gent upon the screen size of its precalculated extent, allowing 
an object’s level-of-detail to change with its size [24].) 

As a simple example of how the original version of IBIS 
works, consider the need to maintain object visibility. In the 
course of satisfying the input communicative goals, IBIS will 
determine that certain objects must be visible, and therefore 
unoccludable . This typically occurs if the objects participate 
directly in a communicative goal. Unoccludable objects must 
not be obscured by others in the world. IBIS can maintain the 
visibility of unoccludable objects by selecting an appropriate 
viewing specification for the illustration, by generating an 
inset sub-illustration, or by altering the appearance of the 
obscuring objects. For example, IBIS can decide not to 
render the obscuring objects at all, to render them as partially 
transparent, or to use a “cutaway” view [9]. 

2.2 Extending IBIS for a Head-Mounted 
See-Through Display 

In extending IBIS to support a head-mounted see-through 
display, we have had to take into account a number of impor¬ 
tant differences: 

• The original IBIS assumes that it generates all of 
what the user sees. In overlaid graphics, however, 

EBIS must instead enrich the user’s view of the 
world with additional information. 

• The user could modify the viewing specification 
after the original IBIS chose an initial viewing 
specification. In contrast, when designing over¬ 
laid graphics, our extended IBIS must from the 
beginning relinquish to the user all control of the 
viewing specification, since only the user can 
determine where they look. 

• The world was assumed to change only after an 
illustration was completed in the original IBIS. 

This was easy to enforce since IBIS maintained 
control over what was visible, and based its il¬ 
lustrations on a world model that was frozen 
throughout the illustration’s life. In contrast, our 
extended IBIS must take into account ongoing 
changes in the world as it generates illustrations, 

• The original IBIS was responsible for achieving 
all communicative goals itself. Because the ex¬ 
tended IBIS controls only what it generates, the 
user becomes an active participant in achieving 
the communicative goals. For example, if a goal 
is to specify an object’s position, and the object’s 
projection does not lie within the window, IBIS 
must indicate to the user where they should look. 


Based on these differences, we have developed a preliminary 
set of new IBIS rules to handle the input communicative 
goals. At its core is a set of low-level style rules, each of 
which may be invoked for any object. These include: 

• Make an object visible . If the object’s projection 
falls within the overlay window and is not oc¬ 
cluded by other objects, nothing need be done: 
the object is not rendered. (In contrast, in the 
original IBIS, the object would have to be ren¬ 
dered.) If the object’s projection is within the 
window, but is occluded wholly or in part by 
other objects, the object is marked for rendering 
to allow it to be seen through its occluders; the 
object’s rendering style will be determined by 
other rules. (Occlusion tests are performed with 
a fast image-precision algorithm described 
in [9].) If the object does not project to the win¬ 
dow, the goal fails. 1 

• Highlight an object If the object is within the 
window, the object is marked for rendering and 
tagged with a highlighted style (e.g., solid lines); 
otherwise, the goal fails. 

Representative higher-level design rules that invoke these are: 

• Show an object. If the object is visible , nothing 
need be done; otherwise, the find goal is ac¬ 
tivated. 

• Show the location of an object. If the object is 
visible , it is highlighted ; otherwise, the find goal 
is activated. 

• Find an object A callout (currently, a “canned” 
text string label) is created for the object, fixed in 
2D relative to the overlay window. A dotted 
rubberband leader line connects the callout to the 
object. (If the object does not project to the 
overlay window, then the dotted line extends to 
the edge of the window in the direction of the 
object, and can be followed by the user to find 
the object; the callout and its end of the leader 
line is always in view.) 

• Show a change to be accomplished in an object’s 
state. This is achieved by generating a “ghost 
image” of the object in the desired state. 

• Show an action performed on an object. This is 
achieved by adding a metaobject , such as an ar¬ 
row, to depict the action (e.g., pushing or open¬ 
ing). 

• Identify an object. If the object is visible , a 3D 
callout (textual label) is generated near the ob¬ 
ject; otherwise the find goal is activated. 

IBIS is initially provided with representations of the physical 
objects in the environment. It uses these representations both 
to generate the overlaid graphics and to evaluate it, employing 


l All goats succeed except where failure is indicated explicitly. 
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Figure 2: Overlay graphics of simple world model. 


the approaches described in [22, 9]. 

3 Testbed Application; End-User Laser Printer 
Maintenance 

To test our ideas, we have been experimenting with a simple 
end-user maintenance application for a laser printer. This 
entails relatively straightforward operations, such as refilling 
the paper tray and and replacing the toner cartridge. 

Figure 1 shows the system in use. The large white triangles 
toward the top and bottom of the figure are Logitech sensor 
transmitters, which have three small ultrasonic sources near 
their vertices. Small triangles mounted on the display and on 
the laser printer each contain three microphones. The top 
transmitter is for the head; the bottom transmitter is for the 
sensors that track the printer’s paper tray and lid. 

Figure 2 shows part of our simple laser printer world as 
displayed in our head-mounted display’s overlay bitmap with¬ 
out any design done by IBIS. In contrast, Figure 3 shows an 
example of an overlay illustration designed by IBIS, using the 
same viewing specification, to fulfill the location and find 
goals for the paper tray and the show goal for the toner 
cartridge. To achieve the location goal, the paper tray is 
highlighted (drawn solid), since the visible goal succeeded. 
To achieve the find goal, a 2D label is generated with a dotted 
leader line that terminates on the tray. This illustration also 
shows the toner cartridge—since it is occluded by the obscur¬ 
ing printer cover, it is drawn with dashed lines to make it 
visible. Figure 4 shows the paper tray, shows the desired 
action of pulling out the paper tray by means of a metaobject 
arrow, and shows the change in the paper tray’s location that 
would result as a dotted “ghost” image. 

4 System Architecture 

The original version of IBIS synchronously designs and then 
renders each illustration, and then incrementally redesigns the 
picture in response to user interaction. While the resulting 
delay is barely tolerable in fully-synthesized animation dis¬ 
played on a conventional CRT, it is unusable if synthesized 
graphics must not only change in response to head motion, 
but must be registered with objects in the real world. In 
addition, we also need to process data from a number of 
motion trackers. Based on our own experience [7] and that of 
other researchers [1, 12,17], it was clear to us that it would be 



Figure 3: Overlay graphics designed by IBIS to find and 
show location of paper tray and show toner 
cartridge. 



Figure 4; Overlay graphics designed by IBIS to show action 
of pulling out paper tray and resulting change in 
tray’s state. 


necessary to distribute processing over multiple processes and 
processors. 

The Private Eye, in its 720 x 280 resolution mode, presents an 
interface that appears as a bare single-bit-deep framebuffer. 
We created a small 3D structured display-list-based display 
server. The server supports line and text primitives, linestyle 
and font attributes, and a set of structure management 
operators that allow hierarchical objects to be created, edited, 
and deleted. Positions in the display server’s move and draw 
commands may be either 2D device coordinates or 3D world 
coordinates, and may be intermixed in a single primitive, 
allowing the creation of lines that are anchored to the screen 
on one end, and to a point in the 3D world on the other. As 
shown in Figure 5, when the system is initialized, IBIS creates 
the initial display list by sending a set of object models to the 
display server. 

Each tracker is handled by a low-level tracker process. These 
processes in turn interact with a set of object servers and a 
head server. Each object server is associated with an object in 
the real world that is monitored with a tracker. At initializa¬ 
tion, IBIS provides each object server with the identifier of 
the display server structure containing the object’s vector 
representation. It also tells the server the position and orien¬ 
tation of the object’s tracker relative to the object’s coordinate 
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Figure 5: System architecture: Initialization 

System. Similarly, IBIS provides the head server with the 
identifier associated with the scene in the display server. (The 
head and object servers actually use the same executable, so 
each must also be told whether it is tracking the head or an 
object.) 

Figure 6 shows the system’s operation after initialization. 
The head and object servers are responsible for maintaining 
the integrity of the object and head motion information that 
they represent. They edit the display list stored in the display 
server directly. Each object server regularly edits the display 
list position and orientation information associated with its 
object. The head server updates the display list viewing 
specification for the scene. Both head and object servers also 
report their information to IBIS. This avoids the delay that 
would result if IBIS were to serve as a go-between, making 
possible relatively smooth visual response to head and object 
motion, while assuring that IBIS always has the latest infor¬ 
mation on which to base its illustration design. 

IBIS determines the presence and appearance of the objects in 
the display list—all information for which it has not explicitly 
relinquished control to the head and object servers. This 
includes the specification of metaobjects , such as arrows and 
text. (IBIS can also dynamically reassign tasks to the head 
and object servers, a facility that we have not yet used.) In 
work on the automated design of multimedia presentations, 
IBIS is presented with a set of communicative goals to satisfy, 
which are output by other components that determine what to 
say and which media should be used to say it [8], In the 
research reported on here, we have instead used a much 
simpler content planner to generate the set of communicative 
goals that IBIS receives. 

IBIS first designs an illustration that satisfies the initial set of 
goals set by its content planner, obeying the constraints im¬ 
posed by the head and object trackers. Then it loops, using its 
evaluators to determine whether the current illustration design 


Figure 6: System architecture: Steady state 

is satisfactory in the face of changes in the goal set, the user, 
and the world. Whenever the illustration is determined to be 
unsatisfactory, IBIS’s rules result in modifications that are 
communicated to the display server. For example, if the 3D 
label of an object that IBIS determines must be identified no 
longer lies within the overlaid graphics window, then IBIS 
will generate a new 2D screen label and 3D leader line that 
points to the object. 

Because IBIS has relinquished to the head and object servers 
the straightforward matters of display-list traversal and up¬ 
dates of the viewing-specification and monitored object trans¬ 
formations, user interaction with the current illustration stored 
in the server is fully interactive, and occurs while incremental 
redesign takes place on a separate processor. We currently 
achieve about 15 frames per second for a model containing 70 
displayed vectors, but with some jerkiness due to network 
contention. 

The head server is responsible for instructing the display 
server to render the image. Since polling the head’s motion 
tracker requires much less time than rendering the image, we 
need to avoid building up a backlog for the display server. 
We accomplish this by having the head server send its com¬ 
mands for the current frame to the display server and then 
wait until it receives the display server’s acknowledgment that 
the previous frame was rendered. This allows rendering and 
polling to proceed in parallel. All of the actions requested of 
the display server (e.g., modifying the contents of an object) 
are processed atomically to ensure that the scene always ap¬ 
pears in a valid, drawable state. 

5 Implementation 

The components of the system run on several different 
machines under different flavors of UNIX, and communicate 
through sockets. IBIS is implemented in C++ and the CLIPS 
production system language [5], and runs under HPUX on an 
HP 9000 380 TurboSRX graphics workstation, which 
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provides a fast hardware z-buffer-based graphics accelerator 
that IBIS uses in its illustration design process. The display 
server is written in C and runs under Mach on an Intel 486- 
based PC “clone,” which supports the Private Eye display 
entirely in software. (A significant portion of the display 
server’s time is spent implementing double buffering, copying 
the graphics frame buffer to the Private Eye’s frame buffer 
and clearing it for the next frame.) The head/object server 
and lower level tracker processes are written in C and C++. 
Since the tracker hardware requires only an RS232 interface, 
and the servers can impose a large load on the machine on 
which they execute, we run them on other workstations. 

Our current 6-DOF tracker devices include three Logitech 
ultrasonic sensors [18] and one Polhemus magnetic 
sensor [16]. Our software allows both kinds of 6-DOF sen¬ 
sors to be used interchangeably, for example to trade off an 
ultrasonic sensor’s freedom from magnetic interference 
against a magnetic sensor’s ability to work without a direct 
line of sight to its source. 

One important point, that others have noted before [2], is that 
experimental interfaces that are coupled tightly to the user 
often require a fair amount of calibration. Our head-mounted 
display is no exception. Since the generated image must be 
registered with that of the real world, each user must perform 
three kinds of calibration: 

• focus . The Private Eye is focusable from less 
than a foot to infinity, with each user requiring 
separate focus adjustments based on their 
eyesight. After putting on the display, the user 
must position a slider until a calibration image is 
comfortably in focus (but see below). 

• visible area. Due to the physical relationship 
between the Private Eye, the beam splitter, and 
the current focus setting, a small portion of the 
display may not to be visible to a viewer. There¬ 
fore, we request that the user determine the view¬ 
able area by adjusting the size and position of a 
visible rectangle until it is as large as possible. 

This establishes a “safe-title” area that is com¬ 
municated to IBIS, in which IBIS can assume 
that anything that is drawn will be visible. 

• viewing specification . To register the image with 
the world, the user must first physically adjust the 
display on their head. Next, they must register a 
virtual object with its corresponding physical ob¬ 
ject, which in turn must be viewed in a known 
position relative to the user. The software takes 
into account the difference in position and orien¬ 
tation of the user’s eye and the measured position 
and orientation of the sensor. 

Registration is a serious issue: Our current motion trackers 
have neither the spatial nor angular resolution needed to 
register graphics precisely with the surrounding world. 
However, we have yet to try mapping sensor inaccuracies to 
correct for nonlinearities [21], and believe that we can also 
improve accuracy with a better calibration strategy. 


(Registration is currently off by about an inch in world coor¬ 
dinates in Figures 2-4.) 

We have also found focus to be a particular problem in our 
current application. We originally built our head-mounted 
display for a hybrid user interface , which embeds the user’s 
view of a “notebook” flat-panel display inside a partial spheri¬ 
cal information surround presented on the head-mounted 
display [10]. The flat-panel display is relatively small and 
tangent to the sphere, which is centered about the user’s head. 
Assuming that the user’s head is stationary, it is easy to adjust 
the Private Eye so that the material that it displays is focused 
at the same distance from the user as the flat-panel display. In 
comparison, “heads up” flight displays are focused at 
infinity [19], since this is effectively where out-of-the-cockpit 
targets are located. 

In contrast to both, our application requires that graphics be 
overlaid on nearby objects that necessitate constant changes in 
visual accommodation to focus in sequence. Since the Private 
Eye must be focused manually, the need for readjustment as 
the viewing distance to the object of interest changes is 
irritating. 2 There is an interesting benefit, however, to the 
precise focus control provided by the Private Eye. Even 
though we are currently using a monocular display, when the 
display is adjusted so that a small synthesized object is in 
focus at a particular distance, the illusion of it being at the 
selected position is quite compelling. (Note that one of the 
difficulties that many users experience in viewing fixed focus 
stereo displays is developing independence between their 
ocular convergence and focus accommodation.) 

6 Conclusions and Future Work 

The work that we have reported on here represents our first 
steps in designing a testbed for the knowledge-based genera¬ 
tion of maintenance and repair instructions using a head- 
mounted, see-through display. We have developed a prelimi¬ 
nary set of rules that allow us. to augment the user’s view of 
the world with additional information that supports the perfor¬ 
mance of simple tasks such as finding designated objects and 
carrying out simple actions on them. Our software architec¬ 
ture makes a clean distinction between design and rendering 
to help prevent design decisions from interfering with inter¬ 
active rendering. 

Our experience with the system has suggested many research 
directions that need to be explored. For example, one impor¬ 
tant problem is the development of a formal model of how a 
user’s performance will be affected by different decisions 
made in designing 3D illustrations, taking into account the 
purpose for which the illustration is generated (specified by 
our communicative goals), as in the 2D design work of 
Casner [3]. 


2 We have considered automating the process by using a servomotor to adjust 
the focus to that of a selected object Note, however, that focus in the entire 
overlay would be affected unifoimly. 
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Support for visible-line and visible-surface determination is 
another issue. IBIS currently bases its illustration design in 
part on whether selected objects are occluded in the current 
viewing specification, and computes these relationships itself. 
Our display server, however, does not support visible-line 
determination. We are particularly interested in incorporating 
into the display server what Kamada and Kawai [13] refer to 
as “picturing functions,” which determine how a projected 
line fragment should be rendered, based on the set of surfaces 
that obscure it. For example, while IBIS currently sets the 
graphical attributes of an entire object based in part on its 
visibility, we would like more precise control to be ac¬ 
complished by the display server, based on the visibility 
relationships of each line fragment IBIS would then be 
responsible for determining the high-level policies used by the 
display server (e.g., making obscured parts dashed). One 
challenge is to do this while still maintaining real-time perfor¬ 
mance. We believe this could be possible on our current 
hardware if IBIS were allowed to select a subset of objects 
against which the display server would perform visibility 
tests. Another intriguing research direction is to explore how 
to design support facilities that would allow IBIS to specify a 
rich set of additional high-level policies to be enforced by the 
display server. 
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aliasing we must detect situations during rendering 
where we go above this sampling step size. 

Consider the geometry of shading: 

l 


-o- 



Figure 1 

Here, ft, £, and R are unit vectors in the direction of 
the surface normal, light source, eye and reflected eye 
directions respectively. Changing any of ft, L or £ 
will change a, the a of Phong’s highlight function. If 
any of these change within a pixel when rendering, 
specular aliasing may occur (Williams, 1983). Luck¬ 
ily, unless the light source or eye are very close to the 
surface, their effect is minimal. The surface normal, 
however, has a more pronounced effect. It can change 
significantly within a pixel, thus noticeably affecting 
a. If the surface is shiny (a high value of n), the 
change in a (A a) within the pixel can exceed A a„ 
(which is smaller for larger values of n) and aliasing 
results. Consequently, to detect specular aliasing, we 
must, within each pixel, determine how much the sur¬ 
face normal ft changes and relate it to the maximum 
allowed by n. (Note: a change of ft by A radians 
changes a by 2A radians). 

The most straightforward way to compute how 
much the surface normal changes is to compute the 
normal at the comers of the pixel and find the pair 
which diverge the most (by computing the six dot 
products of the various pairs of normals and finding 
the minimum). The smallest of these six dot products, 
dot ajj, is compared to the smallest dot product allowed 
for the current value of n, dot^ (via a lookup on n 
into a pre-computed table). If it is greater than the 
value in the table, then no aliasing can occur. Other¬ 
wise, we have detected the occurrence of specular 
aliasing. We thus have derived a simple, analytic algo¬ 
rithm to detect specular aliasing on surfaces using the 
Phong highlight function. The only prerequisite is a 
good indicator of A a. Instead of using the normals at 
the comers of the pixel, another possibility would be to 
use the curvature of the surface and the size of 


intersection to compute A a. 

The above approach of finding the how much 
the surface normal changes within a pixel has to be 
modified slightly when bump mapping is used. In this 
case, one can compute the change in normal when the 
bump map normal is generated and use this value 
instead. 

The algorithm described above is conservative. 
Most of the energy in the power spectrum of cos" (a) is 
in the lower frequencies with very little near the high 
end. Consider the following figure: 



Maximum frequency 
Figure 2 

It is a frequency-frequency plot for several power lev¬ 
els. The abscissa indicates the maximum frequency in 
the power spectrum of cos" (a) (recall that the maxi¬ 
mum frequency is n radians) while the ordinate repre¬ 
sents the cutoff frequency below which a given per¬ 
centage of the power of cos" (or) resides. For example, 
if n is 500, the maximum frequency in the power spec¬ 
trum is 500 radians while 99.9 percent of the power is 
below 52 radians. Our table of minimum dot products 
could be computed more aggressively so that we don’t 
have to perform specular anti-aliasing as frequently. 

Once we have detected that specular aliasing is 
occurring we must be able to suggest a sampling rate 
that eliminates it. To do this we take the arc-cosine of 
the normal-pair dot product, dot^, (for efficiency we 
could perform a table lookup for the arc-cosine) to get 
Aa^. We compare this with A a„ and use the resultant 
ratio to indicate the oversampling rate. 

Eliminating Specular Aliasing 

We have just derived an algorithm for the detec¬ 
tion of specular aliasing. Now we have to decide what 
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to do about it. Crow’s standard solution for removing 
this aliasing can be used but it is unsatisfactory in that 
multiple shading computations per pixel per polygon 
are required; these are expensive. We will now derive 
an alternate solution that requires only one shading 
computation per pixel, a much more frugal approach. 

The purpose of anti-aliasing is to remove the 
high frequencies in a signal that cannot be represented 
by the current sampling rate. If we do not want to 
change this sampling rate we must somehow change 
the signal so that the unrepresentable frequencies are 
absent. One way of accomplishing this low pass filter 
is described by Norton, Rockwood and Skolmoski in 
the work they did on texture mapping (Norton, 1982). 
Their original signal was constructed with a series of 
sine waves, each having a different frequency and 
phase angle. Their method of anti-aliasing was to 
clamp out the sine waves that were too high to be rep¬ 
resented and only use the low frequency sine wave 
components. This approach inspired the specular anti¬ 
aliasing algorithm described below. 

A different method to perform specular anti¬ 
aliasing involves clamping n to values that will not 
introduce aliasing. By replacing n by a smaller value, 
we guarantee that the new highlight function has no 
offending high frequencies. The new value of n, ri, 
depends on the sampling rate A a^. What we are in 
fact doing is replacing the user specified highlight 
function with a duller one, one guaranteed not to alias. 
We only do this, however, in problem pixels so that in 
regions where no aliasing is occurring we use the orig¬ 
inal function. 

The simple replacement of n' for n needs a little 
enhancement before it becomes the complete anti¬ 
aliasing algorithm. Replacing cos" (a) for cos"(ar) 
removes the high frequencies from the original signal 
but in the process it also boosts the lower frequencies, 
making the surface appear brighter than before. Some 
sort of normalization is in order. This normalization is 
encorporated in the algorithm if we multiply the new 
highlight function, cos fl, (a), by the ratio of Maxi- 
mum[n] to Maximumpt']. Maximum[] is an array that 
stores the value of the first (and largest) component in 
the signal cos"(a) for various values of n. By per¬ 
forming the above normalization we try to make sure 
that the overall level of the clamped signal is the same 
as the original signal. The effects of normalization is 
illustrated in Figure 6-3. It plots the ratio of the cosine 
components of the clamped signal to those in the origi¬ 
nal signal, cos 160 (a), for various values of n', starting 
at 20 and going to 140 in step sizes of 20. We see that 


in the low frequencies the original and clamped com¬ 
ponents are identical but as we go higher up in the fre¬ 
quency spectrum the cosine components in the 
clamped function quickly fall off. 



Figure 3 

The resulting specular anti-aliasing algorithm is sum¬ 
marized in the code fragment below: 

iffffor^ >= MinDot[n]) { 

/* no aliasing */ 
n'= n ; 

A specular ~~ specular* 

) else { 

Aarfl= 2*ArcCosine(dof A fl); 
sampleFrequency= nIAa^, 
rt'= Max/tAllowedfsampleFrequency]; 

Ksptcukr = Kspecular *Maximum[n]/Maximum[n']; 

} 

Kspecular indicates the fraction of the light that the 
specular component contributes. The array MaxnAl- 
lowedU is required only if we are performing aggres¬ 
sive specular anti-aliasing. Otherwise, n' is assigned 
the value of sampleFrequency. 

Results 

The above algorithms for detecting and eliminat¬ 
ing specular aliasing were implemented in a z-buffer 
rendering system. The z-buffer visible surface algo¬ 
rithm was chosen because it would guarantee that any 
anti-aliasing observed would have to come from the 
clamping algorithm. Adding the clamping algorithm 
to the z-buffer Tenderer was straightforward. The 
biggest implementation hurdle encountered was 
changing the tiler to have it compute dot^. The three 
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arrays, MinDot[], Maximum □ and Max/iAllowedD, 
were pre-computed for efficiency. Plate 2 shows four 
cylinders, each with identical surface properties but 
different sizes and orientations computed to a resolu¬ 
tion of 256 by 256 pixels. The large central cylinder 
and the one on the lower right exhibit severe specular 
aliasing. The cylinder on the upper left is also exhibit¬ 
ing specular aliasing even though none is visible at 
present. For if it is moved slightly, aliased highlights 
will appear on its surface. This is also true of the 
cylinder in the upper right. Plate 3 shows the same 
scene rendered using the highlight clamping algorithm. 
The highlight down the central and lower right cylin¬ 
ders are now smooth with no jaggies present. A high¬ 
light is faintly visible on the cylinder in the upper left. 
Now, even if the cylinder is moved, no highlight will 
flicker on and off. The cylinder on the upper right has 
no highlight visible as it is too faint. There is so much 
curvature in this cylinder that any visible highlight 
would cause aliasing due to the severe undersampling. 
Plate 4 shows the same scene rendered with more 
aggressive clamping (99.9 percent power). The high¬ 
lights are brighter and not as spread out and aliasing is 
not noticeable. Plate 5 shows a scene with two cones 
with the one on the right being highlight anti-aliased. 
The cone shape is useful in that it illustrates a continu¬ 
ous transition from low curvature to high. As can be 
seen, the transition into the high curvature region is 
smooth when anti-aliased. 

Conclusion 

We have introduced a simple analytic algorithm 
that, given the change of a within the pixel, detects 
when specular aliasing is present and indicates a sam¬ 
pling rate to overcome it. We have also introduced a 
very fast and simple algorithm that removes specular 
aliasing without increasing the sampling rate. 

I would like to thank Xerox PARC, and espe¬ 
cially Frank Crow, for providing the facilities and sup¬ 
port for much of this research project. Also, thanks to 
NSERC for their continuing support. 
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Abstract 

A relaxation method is presented that generates a series 
of spatially distributed stochastic samples following the 
Poisson disk distribution over a range of scales. The 
series is structured so that a prefix subsequence forms 
a lower frequency sampling pattern that is also Poisson 
disk distributed. Unlike strict dart-throwing, the relax¬ 
ation procedure is guaranteed to terminate. The hier¬ 
archical structure allows straightforward adaptive sam¬ 
pling, using techniques analogous to ordered dithering. 
Also presented is an efficient method to optimize the lo¬ 
cation of samples under a minimum root mean square 
quantization error criterion. 

Resume 

Une methode de relaxation est presentee qui genere une 
serie d’dchantillons distribute de fagon aleatoire selon 
une distribution de disque Poisson pour une gamme 
d f echelles. La serie est construile de fagon d ce que 
les premiers tchantillpns correspondent aussi d une dis¬ 
tribution de disque Poisson, mais d’une frequence plus 
basse. Contrairement d la methode du lanct de darts, la 
methode presentte termine tour jours. La structure hier- 
archique permet un dchantillonage adaptatif simplement 
en se servant d*une technique semblable d celle connu 
sous le nom de ordered dithering. Une methode effi- 
cace pour optimizer le placement des echantillons dans 
le sens du valeur quadratique moyenne est dgalement 
presentde. 

Keywords: ray tracing, stochastic sampling, Poisson 
disk, adaptive sampling, dithering, optimization, 
antialiasing. 

1 Introduction 

Stochastic sampling has arisen as an important tech¬ 
nique for the estimation of multidimensional integrals 
in computer graphics [Coo86b, Mit91, LRU85, Mit91], 


Area integrals are implicit in the solution of such prob¬ 
lems as antialiasing, soft shadows, and the simulation of 
finite camera apertures (causing depth-of-field effects). 
Integration over time is needed for motion blur. All 
these problems can be treated together, resulting in 
a large multidimensional domain of integration. Ray 
tracing in particular has benefited from this approach 
because of its inherent point sampling limitations. 

Error bounds can be derived from the variability of 
the samples: variance, contrast, etc. An approximate 
error bound on the results of stochastic sampling using 
variance is given by 



where N is the number of samples, s 2 is the variance 
of the sum, and V is the volume of the domain of in¬ 
tegration [LRU85, PFTV89]. This error relationship is 
typically used in numerical applications of the Monte 
Carlo (stochastic sampling) method. Note, however, 
that it only decreases as a function of y/N. An error 
bound based on contrast is given in [Mit87j; this should 
more closely match the psychophysical characteristics 
of the human visual system. Contrast is defined as 

q _ -/max -Lnin 

■Lnax + I min 

These expressions are particularly useful in adaptive 
sampling techniques, where only enough samples are 
taken to satisfy an error bound in a local region. For 
the contrast measure, supersampling can be initiated if 
the contrast is above a certain bound, indicating high 
variability. 

Research has focused on the problem of choosing sam¬ 
ple positions so as to minimize the impact of this in¬ 
evitable error while minimizing the number of samples. 
Both can be accomplished by carefully choosing the po¬ 
sition of the random samples. 

Instead of structured artifacts such as moir4 patterns 
or staircasing (“jaggies”), aliasing error due to under¬ 
sampling appears as noise. The distribution of random 
samples shapes the frequency spectrum of this noise. 
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Visually, low-frequency noise is particularly objection¬ 
able, appearing as “structure” in the image. Also, a 
low-pass reconstruction filter will typically be applied 
after the sampling, filtering out high-frequency noise; 
often this filter will just be a summation of the sam¬ 
pled values (a box filter). For both these reasons, the 
sampling strategy should try to concentrate the noise 
in high spatial frequencies. 

Mitchell [Mit9l] has recently studied the problem of 
choosing samples across all dimensions of the rendering 
problem to optimize the spectral characteristics of the 
noise on the resulting image. This study has some re¬ 
sults in common with his; in particular, he gives an algo¬ 
rithm which will generate hierarchical distributions very 
close to the ones we give in Section 4, and can be used 
as an alternative to the algorithm given in that section. 
However, this paper also presents a new and very effi¬ 
cient approach to high-quality adaptive sampling using 
these distributions, and presents an efficient optimiza¬ 
tion algorithm that can be applied to other problems 
such as the quantization of normals [McC91] or gener¬ 
ation of random meshes [Tur91]. 

We will focus on the sampling problem in two spatial 
dimensions, which applies to the subproblems of area 
light sources, antialiasing, and camera aperture simula¬ 
tion. 

The problem of selecting a spatial sampling distri¬ 
bution has been studied in graphics [CPC84, DW85, 
Coo 86 b, Coo 86 a, CPC84, Mit87], dithering [Uli87], and 
coded aperture reconstruction [BS81]. The analysis in 
this paper will follow that in [Uli87j; we will analyse 
distributions empirically using frequency-domain tech¬ 
niques. 

Totally random selection of points is not a good solu¬ 
tion, as will be shown in Section 3.1. Other approaches 
to this problem have included jittering samples from a 
periodic grid, n-rooks, and the Poisson disk. Because of 
space restrictions, not all of these sampling approaches 
have been analyzed in this paper. The following sections 
will focus on optimizing the Poisson disk approach after 
first introducing some analysis tools. 

There are indications that the Poisson disk distribu¬ 
tion is one of the best from a spectral point of view. 
In comparison with jittering it is difficult and expensive 
to generate and use: the following analysis assumes a 
table-based implementation. 

2 Analysis Tools 

We need a set of tools to analyse the frequency domain 
characteristics of different sampling patterns. These 
tools will reduce the response to two one-dimensional 
graphs, radial power and radial anisotropy. Radial 
power will measure power at each spatial frequency, 
regardless of orientation; anisotropy will measure the 
variability of this measure about all orientations. 

An empirical analysis will be used. It is often sim¬ 
pler to see how a sampling pattern actually behaves 
than to attempt to derive its response, since any deriva¬ 
tion will require some simplifying assumptions that may 


limit the applicability of the solution. For example, the 
sequential generation of samples using dart throwing 
(Section 3.2) is usually not modelled. 

We assume that radially symmetric responses are ap¬ 
propriate. Alternatives are possible, i.e. highpass two- 
dimensional filters with a diamond-shaped stopband. 
These should be simple extensions of the approach out¬ 
lined here. 

A distribution of samples can be represented by a 
set of impulses in the plane. To analyse the frequency 
domain response, the periodogram of the distribution is 
evaluated and measures of power and power variance in 
a set of circles about the centre are computed. 

The periodogram is the Fourier transform of the au¬ 
tocorrelation function. The autocorrelation 17 of a real 
signal / is defined as 

r/(i7) = m * H-Z) = J f(Z)f(Z- fids 

For real signals the Fourier transform of the autocorre¬ 
lation function is equal to the square of the magnitude 
of the Fourier transform of /(£): 

Rf(<2) = nrjm = \r[f{z))\ 2 , 

according to the complex identity zz* = \z\ 2 , the convo¬ 
lution theorem, and the time-reversal theorem. Using 
the periodogram will allow the power and anisotropy 
at different frequencies to be evaluated, since we can 
compute the anisotropy at any radius. At points closer 
to the origin, energy in the periodogram corresponds to 
low-frequency energy in the autocorrelation, and there¬ 
fore long-distance spatial correlation. Since the peri¬ 
odogram is also the spectrum squared, low frequency 
power also corresponds to low frequency, or large-scale, 
structure in the function /(£). 

We want to avoid low-frequency structure in our sam¬ 
pling patterns because this structure could be erro¬ 
neously perceived as structure arising from the source 
image. The validity of this quality measure depends 
on the quality of the reconstruction filter, which should 
reject high frequencies. It should be noted that a box 
filter has a sinc(u>*)sinc(u;j,) frequency response, which 
allows some leakage at high frequencies and is conse¬ 
quently not the best choice. 

To evaluate the periodogram of a distribution, it is 
necessary to find the mean periodogram of samples 
drawn from the distribution. In the following analy¬ 
sis, average periodograms are computed by summing to¬ 
gether 100 periodograms of sample images drawn from 
each distribution. Each image has a resolution of 32x32 
and contains 64 impulses, a density of 1/16. Images 
are defined over the unit square. An impulse is repre¬ 
sented by a value of 1 at the appropriate location in 
the image. Note that the following approximations and 
assumptions are made in this analysis: 

1 . The impulses are finite-width. 

2 . The location of the impulses are rounded to the 
nearest grid location. 
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3. A finite-resolution grid is used. 

4. The distribution on the unit square is assumed to 
tile the plane periodically. 

5. A finite number of periodograms are used to esti¬ 
mate the mean periodogram. 

Most of these approximations are made so that the dis¬ 
crete Fourier transform, in its efficient FFT form, can 
be used to evaluate the periodograms. 

Note that the assumption that the distribution tiles 
the plane periodically is exactly the situation in many 
uses of the distribution, such as stochastic oversampling 
in ray tracing. The period of the pattern should be 
much larger than the pixel size. 

Once the average periodogram Rf has been evalu¬ 
ated, we can reduce the information to two graphs by 
computing radial statistics. We define a set of annulli 
as in Figure 1. Within each annullus, the mean radial 
power Pi is computed as well as the variance s?. This is 
the variance within an annullus, not between the sam¬ 
ples used to create the average periodogram: 

Pi = — I I Rf (/ cos 9, f sin 0)f df d9 

Ai Jo Jfi 

i p2ir rJi+i 

s 2 i = J-J J (iZ/(/ cos 0, /sin 8) — Pi) 2 f df d$ 

where Ai = tt(/, 2 +1 — f 2 ) is the area of annullus » defined 
by radii /; and /;+i. In practice, the “area” is given by 
the number of samples on the discrete 32 x 32 grid that 
fall within the bounds of a given annullus. The sam¬ 
ple size increases approximately linearly up to the edge 
of the square, then decreases for the annulli that have 
samples only in the corners of the periodogram. For 
this analysis, 20 equally-spaced annulli from the centre 
(0 frequency, or DC) to the corner (highest possible fre¬ 
quency) were used. Figure 2 shows the sample size at 
each frequency. A dotted line has been placed where 
the edge of the square is first encountered, and will be 
present on all diagrams in which it is relevant. 



Figure 1 : Annulli defined to average the periodogram 
radially. Twenty annulli were used for this analysis. 



Figure 2 : Sample size as a function of radial frequency 
for estimation of radial statistics in each annullus. 


Two graphs will be shown for each distribution anal¬ 
ysed: the mean radial power Pr(f) and the anisotropy 
A r (f) defined as 


Mf) 


an 

PHfY 


Larger values of anisotropy indicate a greater uneve- 
ness in the radial distribution of power. Power and 
anisotropy vary over a wide range. For this reason they 
will be plotted in decibels; a value x is expressed in 
decibels according to the relation XdB = —20 log 10 (a;). 
A doubling of power or anisotropy approximately cor¬ 
responds to a 6 dB change. 

The periodograms will also be displayed in three- 
dimensional graphs. In these graphs, the DC peak has 
been reduced by a factor of 4 and actual power is shown, 
not decibels. These plots are for qualitative comparison 
of sampling distributions. 


3 Random Distributions 

Several random sampling patterns have been studied. 
The one most used in practice is jitter sampling, which 
moves samples off a uniform grid by a random amount. 
This technique is simple to implement, but is not spec¬ 
trally optimal. 

This section analyses only Poisson disk sampling and 
derivatives, which have a better frequency response 
than jitter (less low frequency noise). They also have 
another advantage over jitter sampling: an arbitrary 
number of samples may be used. 

The following subsection first analyses strict random 
sampling for comparison purposes. 

3.1 Poisson 

The simplest random distribution is the so-called “Pois¬ 
son” distribution. The coordinates of a point are se¬ 
lected at random from a uniform distribution. Every 
point is completely independent of the others, making 
the generation of points a Poisson process. The number 
of points in an area A is Poisson-distributed with mean 
equal to the area times the density. For this reason, the 
distribution is often called a Poisson distribution. This 


Graphics Interface ’92 





97 


unfortunate terminology should not be confused with 
the univariate discrete Poisson distribution. 

A sample two-dimensional Poisson distribution is 
shown in Figure 3. The tile outlined in the centre of 
the diagram tiles the plane periodically in a square ar¬ 
ray; parts of adjacent tiles are shown. All distributions 
in this study will be shown this way. 

The centre tile shown here would extend over sev¬ 
eral pixels if this sampling pattern were being used for 
antialiasing. 

The periodogram of this distribution, as well as an 
average of 100 other instances, is shown in Figure 4. 
Radial averages for these distribution are shown in Fig¬ 
ure 5. Anisotropy is shown in Figure 6. 



Figure 3: Uniform distribution of impulses on the unit 
square. 




Figure 4: Sample and Average (n = 100) periodograms 
for the Poisson distribution. The DC peak has been 
reduced by a factor of 4. 
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Note how clumpy the spatial distribution in Figure 3 
appears; this is due to the image having significant 
power at all frequencies, as we can see in looking at 
the periodogram. The clumps arise because the low fre¬ 
quency power in the spectrum of the distribution allows 
a variation in the local average of intensity to occur. 
However, we see that the periodogram is a reasonably 
good approximation to an impulse function. 

The distribution is seen to have an even amount of 
anisotropy over all frequencies, which is desirable for our 
application; we do not want any preferred direction in 
the noise. The dumpiness, however, will undoubtably 
give poor performance. To eliminate the dumpiness we 
will have to eliminate the low frequencies in the spectral 
response. 
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(dB) 
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Figure 5: Radially averaged power for the spatial Pois¬ 
son distribution. Top: single sample distribution; bot¬ 
tom: average (n = 100) radial power. 
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Figure 6: Radially averaged anisotropy for the spatial 
Poisson distribution. Top: single sample distribution; 
bottom: average (n = 100) anisotropy. 

3.2 Poisson Disk 

The Poisson disk distribution is a Poisson distribution 
in which no two points are closer than a minimum 
distance 1 . 

In theory, to get a sample of a Poisson disk distribu¬ 
tion one must generate Poisson distributions and eval¬ 
uate each one according to the minimum distance re¬ 
quirement. This is not a very practical algorithm, and 
effective approximations have been developed. 

The dart-throwing algorithm [Coo86b] places points 
sequentially. Each new point is compared to points al¬ 
ready placed; if it is too close it is discarded. The al¬ 
gorithm terminates after a specific number of points 
have been placed, or it has proven impossible to place 
new points after a large number of attempts. A Pois¬ 
son disk distribution in the plane, generated using dart- 
throwing, is shown in Figure 7. Circles are drawn about 
each point in the centre tile at half the minimum inter¬ 
sample spacing. A separation (disk diameter) of 0.1 
that would result in a reasonable run-time was chosen 
experimentally. Intertile interference was also checked 
so that the periodic tiling of the plane would also satisfy 
the Poisson disk criterion. 

Another class of algorithms that generate similar 
distributions are the error diffusion algorithms [FS75, 
Uli87]. These have been used in ray tracers [Mit87], but 

J the Poisson disk distribution is characteristic of the recep¬ 
tors in the retina outside of the fovea [YJ83, WC83], where it 
prevents aliasing; the retinal image is subsampled in those re¬ 
gions. Within the fovea, the receptors are hexagonally packed 
at a density twice the highest frequency of the highest spatial 
frequency passed by the cornea/lens/iris system, so the Nyquist 
criterion is satisfied. 


Figure 7: Poisson disk distribution in the plane. Center 
tile drawn with disks having the same diameter as the 
minimum intersample spacing. 

are limited to a discrete raster. It would also be possi¬ 
ble to generate an adaptive sampling grid using an error 
diffusion algorithm on the grey-scale error image after 
the first pass of a ray tracer. However, error-diffusion 
dithering still has some artifacts, particularly in con¬ 
stant regions [UH87]. We will not use an error-diffusion 
algorithm, but will use another technique adapted from 
ordered dithering in Section 5 when we discuss adaptive 
sampling. 

It has been found that the Poisson disk distribution 
has desirable spectral properties, namely low energy for 
low frequencies. Sample and average periodograms for 
this distribution are shown in Figure 8. The radially- 
averaged statistics for this distribution are shown in Fig¬ 
ure 9 and Figure 10. 

4 Relaxation 

Following a suggestion by Robert Lansdale (docu¬ 
mented in [Lan9l], and also referenced obliquely in 
[DW85]), the minimum radius for the distribution re¬ 
quired by dart-throwing does not have to be fixed. 

Points are placed starting with a large radius initially. 
Once no more space has been found at this radius for 
a certain (large) number of attempts, the radius is re¬ 
duced by some fraction (which is just less than 1). The 
final distribution will still have a minimum separation 
given by the last radius used. A magnification fraction 
is also specified to increase the number of tests as the 
number of samples already placed increases. To save 
tests, the best-fitting sample during each iteration can 
be saved. Eventually, the minimum separation of this 
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Figure 10: Anisotropy for the Poisson disk distribution. 
Top: single sample distribution; bottom: average (n = 
100) distribution. 


Figure 8: Sample and average (n = 100) periodograms 
for dart-throwing Poisson disk distribution. The central 
DC peak has been reduced by a factor of 4. 
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Figure 9: Radial average power for the Poisson disk 
distribution. Top: single sample distribution; bottom: 
average (n = 100) distribution. 


sample and the decaying radius will meet. In this way 
work from previous tests will not be wasted. 

Such a distribution is shown in Figure 11. The circles 
surrounding each point in the centre tile indicate the 
radius in use when that point was placed. Visually, the 
distribution generated is similar to that generated by a 
fixed radius. 

A similar approach was taken by [Mit91], but was 
not exploited for its adaptive sampling potential. Our 
algorithm relaxes the radius condition only when nec¬ 
essary, obtaining the best possible fit for the number of 
samples considered. Unfortunately, this may result in a 
longer running time. We will show empirically that the 
spectrum retains the desirable properties of the original 
Poisson disk distribution, at least over a limited range 
of scales. The algorithm differs from Mitchell’s in two 
ways: a hard radius constraint is used; and the radii 
are guaranteed to be strictly decreasing (the last point 
could be addressed by sorting the samples generated by 
Mitchell’s algorithm). 

The new algorithm has many advantages. First, it 
will always terminate with a position for any desired 
number of samples, since the radius should eventually 
get small enough to allow every one to be placed. It 
is very possible that a dart-throwing algorithm with a 
fixed radius will not allow all samples to be placed, and 
will therefore fail to produce a distribution. 

Second, this algorithm is less sensitive to its initial 
choice of parameters than strict dart-throwing. A larger 
threshold of trials or a larger decay fraction will increase 
running time and (hopefully) increase the “tightness” 
of the distribution, but will not cause the algorithm 
to fail totally. There might, however, be some regions 
which are more densely packed than others. This can 
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Figure 11: Approximation to a Poisson disk distribu¬ 
tion generated by relaxation dart-throwing. The circles 
about points on the left indicate half the minimum in¬ 
tersample separation in use at the time the point was 
placed. 


happen if the radius decay is too steep or the number 
of tests per iteration too few. Since the algorithm is 
adaptive, the same set of parameters can be used to 
generate distributions for a wide range of circumstances. 

Finally, and probably most importantly, the resulting 
list of samples has a pyramidal property. Only one high- 
resolution sample distribution has to be generated for 
a wide range of sampling densities; if a lower sampling 
density is required, then only the initial points from 
the sample list are selected. These points will satisfy a 
larger minimum distance criterion than the full set. 

In ray tracing, a common problem when using 
stochastic sampling is how to adaptively improve the 
sampling density. In the past, jitter sampling has been 
used rather than Poisson disk sampling because of the 
expense of generating each distribution. In [Mit91], 
Mitchell suggests scaling down the sample grid and 
replicating it; this, however, will introduce some un¬ 
wanted periodicity which may compromise the effective¬ 
ness of the random grid and reintroduce coherent noise. 
This will only be a problem if a short period is used. 
With this algorithm, a single long list can be used and 
the required number of samples chosen, resulting in a 
consistent long period. 

We need to analyze this distribution quantitatively. 
It will be shown that, empirically, its spectral properties 
closely match those of the fixed-radius dart-throwing al¬ 
gorithm. The distribution in Figure 11 was generated 
with an initial minimum spacing of 0.3 (disk radius of 


0.15). The radius was multiplied by 0.99 after 1000 sam¬ 
ples were tried without any success in placing a new 
point. Its periodogram is shown in Figure 12, along 
with an average periodogram of 100 other distributions 
generated with the same parameters. The radially- 
averaged statistics for this distribution are shown in 
Figure 13 and Figure 14. 



Figure 12: Periodograms for Poisson disk distributions 
generated by relaxation. Top: single sample distribu¬ 
tion; bottom: average (n = 100) periodogram. The 
central DC peak has been reduced by a factor of 4. 

The sequence of radii for the samples is given in Fig¬ 
ure 15, as well as the average over the 100 tested distri¬ 
butions. Note that only the final samples approach the 
radius limit of 0.05 used for the basic dart-throwing in 
Section 3.2. 

5 Adaptive Sampling 

5.1 Continuously Variable Sampling 
Density 

In many cases, it is desirable to locally modify the 
sampling density. For example, in the Introduction a 
formula was given for estimating the error, which de¬ 
pended on the variance of the samples collected. Se¬ 
lective supersampling corresponds to discrete levels of 
sample density, and can be triggered by a set of con¬ 
trast thresholds. Using the pyramidal property of the 
relaxation-generated sequence, as many samples can be 
taken as needed, until an error criterion is satisfied. Ev¬ 
ery new sample will fit into a Poisson disk distribution, 
at ever-decreasing minimum spacing. 
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Figure 13: Radial average power for the Poisson disk 
distribution generated by relaxation. Top: single sam¬ 
ple distribution; bottom: average (n = 100) distribu¬ 
tion. 


Instance 

Relaxation 

Radial 

Anisotropy 

(dB) 



Radial Frequency at Centre of Annulus 


20 

Average ^ 

Relaxation 
Radial 0 

Anisotropy n 

(dB) 

-20 

Radial Frequency at Centre of Annulus 



Figure 14: Anisotropy for the Poisson disk distribution 
generated by relaxation. Top: single sample distribu¬ 
tion; bottom: average (n = 100) distribution. 
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Figure 15: Sample Radii for the Poisson disk distribu¬ 
tion generated by relaxation. Top: single sample distri¬ 
bution; bottom: average (n = 100) distribution. 


A set of subsequences can be extracted and stored; 
each subsequence has samples that fall into predeter¬ 
mined partitions. The refinement can then be localized 
to only the partitions whose error estimates exceed a 
bound. The ultimate extension of this approach is im¬ 
portance sampling. 

5.2 Importance Sampling 

If the approximate shape or some factor of the function 
to be integrated is known, then the number of sam¬ 
ples should be increased in areas in which the function 
has a high value. The sampling density then replaces a 
weighting factor. Many examples of this occur in com¬ 
puter graphics. 

For example, consider the ray tracing simulation of 
a glossy surface. The energy reflected towards the eye 
is actually an integration over the hemisphere of the 
incoming energy, multiplied by the reflectance function. 
The reflectance function is known; the incoming energy 
is not. The integral can be performed using stratified 
Monte Carlo sampling. Random samples are shot in a 
distribution of orientations, but more samples should 
be shot in directions from which incoming energy will 
be strongly reflected. 

For another example, it may be possible to replace 
a complex reconstruction filter with a simple box fil¬ 
ter (mean of samples in a pixel) but to use importance 
sample according to the weight given by a higher-quality 
filter. This would have the effect of concentrating more 
samples in the center of the pixel. 

Importance sampling can be trivially solved using an 
approach from ordered binary dithering. In ordered 


Graphics Interface ’92 










102 



Figure 16: Adaptive Poisson disk sampling; density 
weighted by exp(—5(x 2 -f* y 2 )). 


dithering, a mask is used which is an array of thresh¬ 
olds, one per pixel. A full mask is usually generated by 
repeating some basic pattern. At any point, if the pixel 
in the source image is greater than the threshold at that 
point in the mask, then the pixel is turned on. 

A similar idea can be used if we have previously gen¬ 
erated a large Poisson disk distribution with uniform 
density using relaxation. We know that the more sam¬ 
ples chosen from the list and displayed, the denser the 
samples will be, although they will all follow a Poisson 
disk distribution at some scale. Therefore, select the 
samples according to the following algorithm: 

1. For each sample si, compute its rank as r(s,) = 
i/n, 1 < i < n. 


We would like the distance from every point on the 
plane to the nearest sample point to be as small as pos¬ 
sible. If the function to be estimated varies smoothly, 
then this strategy will result in the minimum root mean 
square (RMS) error. It will also mean that no large 
“gaps” will be left in the sampling structure. 

We know the optimal solution to this problem, at 
least on the plane: a periodic distribution of samples in 
a hexagonal grid. However, we want to avoid periodic¬ 
ity. This should not be a problem, since optimization 
procedures that can find global optimums are rare. In 
this case, we actually want an optimization procedure 
that will return a somewhat suboptimal solution close 
to our initial configuration. We can then initialize with 
a Poisson disk distribution and hopefully derive a result 
which still retains most of its useful properties. 

The Generalized Lloyd Algorithm (GLA) described 
further in [Llo82, JS86, For88] is based on a minimiza¬ 
tion of the (root) mean square error in quantizing each 
position in a multidimensional metric space to the near¬ 
est sample point. It is a strictly descending method and 
as such is unlikely to find global optimums in complex 
situations. 

Given x distributed according to probability den¬ 
sity function p(£) over a space with distance metric 
the mean square error is: 


*mse 



p(x)d 2 (x, X{) dx , 


2. Let 

x(g .) _ m - min (/) 

max(/) — min(/) ’ 

where / is the known importance function. In the 
first example above, this would be the value of the 
reflectance function; in the second, it would be the 
weight of the reconstruction filter. 

3. If r(si) < x(s»), choose si as an evaluation sample 
point; otherwise, discard it. If a sample is chosen, 
it is part of the pattern needed to represent the 
density x. 

This is very efficient, since the same table of samples 
can be used over and over for a variety of importance 
functions, as long as care is taken to not allow overly 
periodic use of the same sampling pattern. 

An example of a set of samples whose density is 
weighted by exp(—5(x 2 + y 2 )) over [—0.5, 0.5] 2 is shown 
in Figure 16. A subsequence of 227 samples were se¬ 
lected out of 512. 

Once a subsequence has been generated, it can still 
be used in the manner described in Section 5.1, since it 
will still satisfy the increasing density criteria. Only as 
many sample points are taken as are needed. 

6 Optimization 

Although a Poisson disk distribution has some nice 
qualities, it is possible to generate a better one from 
a quantization viewpoint. 


where V t is the domain of x,: the set of all points that 
map to Si. Note that both probability and error are 
accounted for in this metric. This metric more heav¬ 
ily weights extremes of error, while not neglecting the 
mean error rate. The mean square metric is convenient 
analytically because many useful results can be proven 
using it. 

Two results that are useful in the design of quantiz¬ 
ers give necessary (but not sufficient) conditions for an 
optimal MSE quantizer. In an optimal MSE quantizer, 

1. Borders between two domains will be equidistant 
from the respective quanta under the distance met¬ 
ric d. 

2. Quanta will be placed at the mass centres of their 
domains, where the mass centre of a domain is de- 
fi ned as 

_ f Vt xp(x)dx 
fv, pW di 

In other words, the quanta should be the average 
value of their domain. 

These conditions are easily proven. 

Condition 1: Every point x with a non-zero prob¬ 
ability should be mapped to the quantum which it is 
closest to under the distance measure d. Obviously, if 
it was not, the error would be higher and the mean 
square error metric larger. The boundaries are there¬ 
fore equidistant under d } since points on the boundary 
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can choose either quanta with no penalty in the error 
metric. If this was not true of the boundary, then the 
points on this boundary could be moved to the lower 
error domain and the metric reduced. 

Condition 2: The value f v p(x)d 2 (x,x,) dx is the mo¬ 
ment of inertia of p{T>i) around the point x,; it obtains 
its minimum when x, is Zi, the mass centre of p(Z\). 

The same results hold if the distance measure is angle 
on the surface of a sphere, with appropriate modifica¬ 
tions to definitions (the edges of the domains will be 
great circles, for example). 

The GLA alternates attempting to satisfy the above 
conditions. First, a set of initial quanta are chosen. 
Boundaries between domains are set using condition 1, 
and then the mass centres of each domain are computed. 
The quanta are moved to the mass centres to satisfy 
condition 2, which of course changes the boundaries and 
invalidates condition 1. The process is repeated until no 
more changes are necessary, assuming the Si converge. 

The optimization procedure can be used to generate 
a non-uniform distribution; simply vary the p(S) term 
when computing the centroid. The probability distri¬ 
bution p(S) should be smooth or the optimization may 
get wedged into a local minimum. 

This technique will converge faster if the initial distri¬ 
bution is nonuniform. Such a distribution can be gen¬ 
erated by the algorithm in Section 5. Alternatively, 
relaxation can be used directly, but the radius modified 
by the weighting function. 

The positions of the quanta and the domain bound¬ 
aries may not converge and may in fact go to infinity, 
depending on the shape of the probability distribution. 
This problem is avoided in the current problem by op¬ 
timizing on a finite domain; the sides of the rectangle 
wrap around and so the optimization effectively takes 
place on the surface of a toroid. 

Each iteration can be implemented by constructing 
a Voronoi diagram and finding the centre of mass of 
each cell. Construction of each diagram takes O(nlogn) 
time, although in a sophisticated optimization proce¬ 
dure the diagram could be built incrementally; only 
small changes are required for each new iteration. 

To generate the results shown here, the Monte Carlo 
method was used to estimate the centroid of each cell. 
This is an inefficient method compared to the Voronoi 
approach, but is simple to implement. 

In other uses of this optimization scheme, a globally 
optimal solution may be important. Even if the posi¬ 
tions of the quanta do converge, there is no guarantee 
that the quantizer thus derived will be globally optimal. 
In fact it is unlikely in complex situations, since the 
method is descending. If an optimal solution is desired, 
decaying noise can be added to simulate annealing. This 
will not be too expensive because the underlying opti¬ 
mization procedure is accurate. 

A distribution optimized using a uniform probability 
distribution is shown in Figure 17. Ten iterations were 
used. As with the random distributions, 100 of these 
distributions were analysed and the results appear in 
Figures 18, 19 and 20. The average spectra shown in 
Figure 18 displays peaks corresponding to the horizon¬ 


tal and vertical directions. The danger of using this 
technique is that the optimization may actually find a 
global optimum and regain periodicity. An optimized 
distribution might be combined with jitter to avoid this 
problem. 



Figure 17: Optimization of a uniform distribution gen¬ 
erated by dart-throwing. The circles about points on 
the central tile indicate half the minimum intersample 
separation in use at the time the point was placed. 


7 Conclusions 

A number of refinements in the generation and use of 
the Poisson disk stochastic sampling strategy have been 
presented and analysed empirically. It has been shown 
that by using relaxation, a sequence of samples can be 
generated such that prefix subsequences are also Poisson 
disk distributed. Sequences with this property can be 
used in both adaptive and importance sampling in a 
very efficient manner. 

Unfortunately, to use these distributions in practice 
still requires large lookup tables, since the distributions 
were generated using a refinement of dart-throwing; 
the generation process is still relatively expensive and 
should not be included within the rendering loop. 

An optimization technique has been presented which 
has uses in sampling and quantization. It will also be 
useful in other areas where points have to be placed 
uniformly within a finite domain, such as mesh gen¬ 
eration over arbitrary smooth shapes. It can optimize 
sample positions to fit an arbitrary density distribution, 
although a smooth distribution has more chance of suc¬ 
cess. 
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Figure 18: Periodograms for optimized distributions. 
Top: single sample distribution; bottom: average (n = 
100) periodogram. The central DC peak has been re¬ 
duced by a factor of 4. 
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Figure 19: Radial average power for the optimized dis¬ 
tributions. Top: single sample distribution; bottom: 
average (n = 100) distribution. 
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Figure 20: Anisotropy for the optimized distributions. 
Top: single sample distribution; bottom: average (n == 
100) distribution. 
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Performing In-place Affine Transformations in Constant Space 
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Abstract 


Affine transformations of 2-D frame buffer images are a 
common computer graphics operation. Such 
transformations take a rectangular raster of image 
memory, perform some affine transformation (e.g. scale, 
shift, shear, rotate) upon it, and write the result into some 
other rectangular raster of image memory. 

If the source and destination share the same memory, the 
operation is termed in-place. Previous in-place affine 
transformation algorithms on an m by n region required 
0(max(tf?,tf)) space for internal buffers. The algorithm 
presented here requires 0(1) (constant) space: this allows 
in-place affine transformations to be performed on large 
images on processors with small memory. 

Keywords: affine transformations, Catmull-Smith, frame 
buffer algorithms. 


source image 



transform 


//C 



source map 
destination image 
written 
clipped 

cleared 


1. Introduction 


Figure 1: An afiine transformation 


An affine transformation is a transformation of the form 


x' = Ax + By + C, 
y‘ = Dx + Ey + F , 

for arbitrary real values A,B,C,D,E , and F. 

Rotations, scales, shifts, and shears are all affine 
transformations. A computer graphics image is 
commonly considered as a rectangle whose contents are 
an array of pixels. An affine transformation can therefore 
be defined upon an image by performing the 
transformation upon the rectangle and then resampling 
[Catmull80]. 


2. Previous Work 

2.1 The naive algorithm 

A general affine transformation can be most easily 
implemented by creating a temporary image the same size 
as the source, and computing the value at each 
destination pixel in this temporary buffer. After all values 
have been computed, the temporary image over-writes 
the source image and is then freed. This approach 
requires 0(m n) space when presented with an m by n 
source, and is only presented for comparison: it is not 
used in practice. 

2.2 The Catmull-Smith algorithm 

The popular Catmull-Smith algorithm [Catmull80] works 
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by decomposing the affine transformation into two 
perpendicular shears. In the first shear the x* are 
computed, while the y coordinates are left untouched. In 
the second shear, the y' values are computed. To 
minimize artifacts the order of the two passes may be 
reversed, and transpositions or reflections may be 
required. 

Each shear can be performed a scanline at a time, as y is 
unchanged by the x shear and x is unchanged by the y 
shear. Therefore, to transform an m by n image, a 
temporary scanline buffer of size 0(max(m,w)) is 
required. 

3. Motivation for the New Algorithm 

The Catmull-Smith algorithm has become a fundamental 
and accepted part of computer graphics software 
environments, with only minor extensions [Fraser85, 
Smith87] in the twelve years since its original publication. 
Of what interest is a new algorithm based upon it? There 
are both theoretical and practical motivations for the new 
algorithm. 

Theoretical : Just as the Catmull-Smith algorithm 
was motivated by the inability to process an entire 
image in-core, this paper presents an algorithm 
which is motivated by the inability to process an 
entire scanline in-core. An extension to a common 
O(n) algorithm which uses 0(1) space may be of 
some theoretical interest. 

Practical: As computer graphics matures, it has 
begun to tackle more sophisticated problem areas 
with less specialized processors. These new 
problem areas, such as pre-press and satellite 
imaging, often have images with thousands or tens 
of thousands of pixels per line, and with tens or 
hundreds of bits per pixel. Further, the 
applications which manipulate these images often 
must run on processors with limited address space, 
such as the IBM PC [Microsoft87], the Macintosh 
[Apple85], or the Pixar Image Computer 
[Levinthal84]. In such environments, an algorithm 
such as the one presented is a necessity, not a 
luxury. 

If both the image and the calculation buffer are stored in 
core, than an 0(1) algorithm is only of interest in the rare 
case where memory is just barely big enough to fit one 
image. An 0(1) algorithm is of more use when the image 
is kept in secondary storage (disk, frame buffer, etc.) and 
only the calculation buffer is stored in-core. In that case, 
the difference between an 0(n) and 0(1) buffer can be 
significant 

3.1 Software Architecture 

The new algorithm analyzes the transformation and 
decomposes it into a series of calls to the Catmull-Smith 


algorithm. This allows the new algorithm to be 
incorporated as a extension to an existing library: no 
change in base software is necessary. Furthermore, any 
efficiencies or optimizations encoded into 
implementations of the Catmull-Smith algorithm can still 
be called upon: the wheel need not be re-invented. This 
does require, however, that the new algorithm issue 
‘'requests" for affine transformations which always 
transform rectangular sources into rectangular 
destinations, as this is the common format expected by 
software libraries. 

The algorithm presented in sections 6-11, therefore, 
satisfies two constraints besides the 0(1) constraint. First, 
that the Catmull-Smith code is sacrosanct, and may not 
be changed or modified This may be the case if that 
algorithm is provided in firmware, is written in micro¬ 
code, or is directly supported in hardware. Secondly, that 
the overhead involved in invoking that code is sufficiently 
significant that the number of calls to it should be 
minimized. If neither of those two constraints apply, then 
the simple and general algorithm presented in section 11 
is sufficient 


4. Notation 


The source image is the rectangular array of pixels 
comprising the picture which is to be transformed. The 
destination image is the rectangular array of pixels which 
is to receive the transformation. An in-place 
transformation is one in which the source image equals 
the destination image. The source map is the 
parallelogram which is defined by the map of the affine 
transformation over the source image. The source map 
and destination image will often be quite different Pixels 
may be in the image but not in the map, in which case 
they are to be cleared to some background value. Pixels 
may be in the map but not in the image, in which case 
they are clipped. See figure 1. 

Without loss of generality, we focus on the work to be 
performed by the x shear, where the shear acts on the x 
coordinates of the image without displacing it vertically. 
Let iw and ih be the width and height, respectively, of the 
source image. Let dw and dh be the width and height, 
respectively, of the destination image. 

In the equation x' = Ax + By + C, we will rename A as 
scale, B as tilt, and C as offset, yielding x* = scale * x + 
tilt * y + offset. 

Define dst(x, y) as the x component in destination space of 
the image under the affine transformation of the given x,y 
coordinate in source space. Similarly, let src(x\y ) be 
defined as the x component in source space of the pre- 
image of the given x\y' coordinate in destination space. 
Note that src(dst(x,y),y) = x. Both the dsi and src 
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functions return a single x value, with no y value; they 
both map from R 2 to R 1 . 

For example, a transformation which scales the input 
image up by 10% and shifts it left by y pixels on the y'th 
scanline would have 

x' — dst(x 9 y) = l.lx -y, and 

x = src(x\y) = (x* +y)/1.1 

We also assume the function bboxQ , which computes the 
bounding box in source space of the pre-image of the 
current destination rectangle, adding the appropriate 
amount at each end for filtering. Further, let M be the 
maximum size of a scanline in core, and let / be the 
amount which must be added to each end of a source 
scanline in order for a destination scanline to be 
calculated. The value of /is a function of the filter width, 
the particular filtering algorithm used, and the scale. 
When resampling, each source pixel influences source 
space for/pixels to the left and the right, yielding a total 
"penumbra 4 of 2f + 1 pixels in reconstructed source 
space. When scaling down (scale < /), this penumbra is 
spread over a great many destination pixels. Therefore, 
to ensure that at least 1 pixel can always be written, we 
require that 


M>=(2f+ l)/MlN(scflte,l) 


iw 

source image width 

ih 

source image height 

dw 

destination width 

dh 

destination height 

off 

x’ = off +x "scale + y * tilt 

bbox() 

bounding box 

dstO 

map from source to destination space 

*0 

map from destination to source space 

M 

pixel bandwidth limit 

f 

source padding due to filtering 


Table 1: Notation 


5. Why is this hard? The feedback problem 

What makes an CXI) solution difficult? If no more than 
M pixels can be read or written at a time, one might 
imagine calling the Catmull-Smith algorithm on each M- 
sized chunk of the input scanline, writing the results out a 
piece at a time. The problem with this approach is that 
when an output piece is written, it may well overwrite a 
future input piece. This problem, on a grander scale, was 
exactly why the Catmull-Smith algorithm took pains to 
ensure that y is unchanging in the x pass, and why the 
naive algorithm allocated a huge buffer: dst{\, y) may be 
less than x, equal to x, or greater than x, and this relation 
may vary even within a scanline. The problem is 
exacerbated by the fact that anti-aliasing requires that an 
entire neighborhood of source pixels be readable when 
computing a single destination pixel. 

Inefficiency is a further limitation of a per-line approach. 
The new algorithm will work by means of calls to the 
Catmull-Smith algorithm, which works on rectangular 
regions. It is a waste of inter-line coherence to call it on 
scanlines (or worse yet, parts of scanlines) only: we wish 
to call it as few times as possible, on regions as large as 
possible. 

The problem is therefore to divide the original region, 
which is too large to transform all at once, into a set of 
regions, such that no pixel in any source region is over¬ 
written before it is no longer needed, and the number of 
regions is as few as possible. 

6. Ad outline of the algorithm 

It is advantageous to deal with affine transformations in 
what we define as standardform : 

scale >= tilt , and scale > 0 

This form can be obtained by transposition (to satisfy the 
first clause) and reflection (to satisfy the second). 
Standard form allows a few more invariant assumptions: 
that increasing x increases x\ and that vertical inter¬ 
scanline coherence is greater than horizontal inter-pixel 
coherence. 

The algorithm consists of a series of transformation 
algorithms, with a case analysis to decide which 
transformation algorithm to perform. The transformation 
algorithms employed are more and more general and take 
more and more time: the case analysis finds the least 
general (and hence quickest) transformation algorithm 
appropriate for the particular transformation. The case 
analysis begins by dividing the source image into between 
1 and 3 sub-images, depending on the characteristics of 
the transform. This process is described in section 8. At 
most one of these sub-images will require further case 
analysis, described in section 9. Some of that sub-images 
sub-images may require even further case analysis. 
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described in sections 10 and 11. This completes all cases. 
This case analysis is performed purely for optimization 
reasons: increasingly more difficult cases are processed by 
increasingly more general (but slow) algorithms. One 
could avoid all case analysis by using only the algorithm 
of section 11, but that could be far slower. 

The case analysis routines all assume a subroutine, 
termed Helper, which performs an affine transformation 
on an arbitrarily large source and destination image, 
given the maximum internal scanline width M, and an 
evaluation order: left-to-right vs. right-to-left, and 
bottom-to-top vs. top-to-bottom. Before describing the 
case analysis in detail, we first describe ’Helper’, the 
foundation of the system. 


7. The Helper subroutine 


The Helper subroutine is used to perform a piece-wise 
affine transformation from a given source image into a 
given destination image. It assumes that it need not 
worry about feedback. The Helper subroutine is solely 
concerned with slicing up the transformation into 
manageable chunks. 

The obvious way to perform the Helper subroutine is to 
march along the source, transforming rectangles in the 
source into parallelograms which would then be written 
into the destination. This method was not chosen because, 
as Catmulll [Catmull80] says, ”[n]ot only is this 
inconvenient, it is also difficult to prevent aliasing errors”. 
Instead, the Helper subroutines marches along in 
destination space: each destination rectangle is inverse- 
transformed to obtain a parallelogram in source space. 
That parallelogram is then rounded out into a rectangular 
bounding box (padding by / for filtering), and the 
Catmull-Smith algorithm is called. Since the source 
bounding box may be significantly larger than the source 
parallelogram, the shear may try to write a set of pixels 
which lie outside the current destination rectangle. 
However, since the destination rectangle lies along exact 
pixel boundaries, simple clipping will reject these extra 
pixels. See Figure 2. 


Take a rectangle from the destination: inverse-map to the source 
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Figure 2: the Helper subroutine 

What should be the dimensions of the destination 
rectangle? We wish to make it as large as possible without 
exceeding the Af-pixel wide bottleneck. 

Suppose we decide on a destination rectangle of size dx 
by dy. Then the constructed source rectangle will have 
width 


sx = ((dx + (dy - /) \tilt\)/ scale) + 2f 

Either dx or sx will be bounded by M. When dx > M , let 
dx = M and dy = dh. When sx > M, there are more 
pixels to read than to write, 

dx < sx, 

dx < ((dx + (dy-1)\tilt\)/scale) + 2f 

We wish to minimize the number of destination 
rectangles, and therefore maximize the area of each. The 
problem is now to maximize dx * dy, where 
1<= dx<= M 

scale<(dx + (dy- l)\tilt\)/(dx• 2J) 

((dx + (dy - l)\tilt\)/scale) <= M 

This is a quadratic programming problem. We 
approximate it by maximizing dy (setting it to dh), and 
then solving for a putative dx. If dx < dy and dx < dw, then 
the putative rectangle is tall and skinny. To process an 
area closer to a square (this is desirable since Catmull- 
Smith has a per-scanline overhead), we recursively 
subdivide the rectangle by splitting it in two in y. 


8. The first level of analysis 



RIGHT 


LEFT 


Figure 3: Different types of shear 

The source image is subdivided into between one and 
three source sub-images, such that the map of the shear 
within each sub-image either: 

1) overlaps it on the left on every line (type left). 

2) overlaps it on the right on every line (type RIGHT). 

3) overlaps it on both the left and the right on every 

line (type both). 

4) overlaps it on neither the left or the right on every 

line (type neither). 

The dashed lines in figure 3 show the subdivision lines for 
that shear: the top region is of type right, the middle 
region is of type both, and the bottom region is of type 
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8 .1 Type left 

This case is easily processed. The Helper subroutine can 
be called on this region with an evaluation order of left- 
to-right. 

8.2 Type right 

The Helper subroutine can be called on this region with 
an evaluation order of right-to-left. 

83 Type neither 

In this case, the transformation is scaling the source image 
down, and there is not a very pronounced tilt However, 
evaluation order is still not clear: if offset (the translation 
component) is sufficiently large, then the first source 
pixels lie in the middle of the scanline: either right-to-left 
or left-to-right evaluations can fail. Therefore, this case is 
processed by decomposition into two shears: 

1) The Helper subroutine is called to perform the 

scale component of the shear. Without any 
translation, dst(x y y) < x for all x, and so the 
algorithm should proceed left-to-right 

2) The Helper subroutine is called again to perform 

the non-scale component If offset < = 0, then 
dst(x y y) < x, and so the algorithm should proceed 
left-to-right. Otherwise, it should proceed right- 
to-left. 

8.4 Type both 

This case is complex and treated in detail in the next 
section. 

9. The second level of analysis: type BOTH 

Using the case analysis of section 8, 3 of the 4 possible 
shear configurations could be processed. This section 
focuses on the fourth, most difficult case, when the source 
map overflows the destination image on both sides. This 
case is more difficult because every pixel in the 
destination must be written, and hence every pixel in the 
source is at risk of being over-written before it is read. 
Checks are first made to see if the shear falls into one of 
two special cases: 

9.1 small-source 

Suppose that the shear is a large scale-up, with little or no 
translation/tilt component. In this case, only a small part 
of the source is necessary. The algorithm can ‘'cut out'* 
that portion, and use the other portions as it wishes. The 
area of the source needed is considered “small" if and 
only if 

bboxf) <- dw 


This means that the area of the source needed to compute 
the destination (bbox 0) fits entirely within the destination 
(width dw). See Figure 4 for an example. 



neicle! sraal *" source: on P art °f the source is 

9.2 small-dest 

Conversely, suppose that the shear is a large scale-down. 
Regardless of the translation/tilt component, in this case 
only a small number of destination pixels are truly 
“derived": the others will be set to a certain background 
color. In this case the algorithm can compute that portion 
and then shift/shear it into place. The area of the 
destination needed is considered “small’ if and only if 

iw * scale <= dw 


This means that the width of the scaled source image (iw 
* scale) fits entirely within the destination. See Figure 5 
for an example. 



9.3 if either succeeds 

If either condition “small-source" or “small-dest" holds, 
then the affine can be implemented as follows (see Figure 
6 ): 

1) Copy the needed portion of the source, left- 

aligned, into the destination. 

2) Perform the scale component of the 

transformation using the Helper subroutine. 
Perform this right-to-left if scale > 1, and left-to- 
right otherwise. 

3) Recursively perform an affine transformation, 

using only the other components of the original 
affine transformation. 
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Figure 6: processing small-source 


10. The third level of case analysis 

Neither of the above special cases may hold. In that case, 
further analysis is required. 

Consider a given (x,y) pixel in the destination. There are 
three cases for that pixel: it may be far to the left of its 
source pixel pre-image, it may be far to the right of it, or 
it may be neither, it is “far to the left” if 

x'<= src(x\y)-f 

and "far to the right" if 

x'>= src{x\ y) +/ 

Define the boolean predicate left(x\ y) to be true if and 
only if a destination pixel is far to the left, and the 
boolean predicate right{x\y) to be true if and only if a 
destination pixel is far to the right Since the 
transformation is in standard form. left{x\y) implies 
left(x -l,y) and right(x\y) implies right(x'+ l,y). 

Due to the tilt in the shear, the rightmost leftQ pixel and 
the leftmost rightQ pixel may be at a different place on 
each line. Therefore, we define the function LEFTQ as 
the greatest x'such that left(w\y) for all w'< = x\ for all 0 
< = y < = dw. RIGHTO is similarly defined as the least x' 
such that all pixels to the right of it have the rightO 
property. Figure 8 shows a sample set of LEFTO and 
RIGHTO regions. 

Intuitively, the leftO and rightQ predicates detect those 
destination pixels which are "thrown clear” of their 


source pixels. The LEFTQ and RIGHTQ regions are the 
largest rectangles contained within the parallelogram- 
shaped leftQ and rightQ regions. 



RIGHTO / 

# 

✓ 


4 —' 


Figure 8: The LEFTO and RIGHTO regions 

If LEFTQ or RIGHTQ regions exist, then the Helper 
algorithm is performed on the destination/source regions 
defined by them, and the problem is trimmed 
accordingly. However, not all source pixels are freed by 
this operation: without further help, this operation would 
quickly grind to a halt. 

Therefore, the remaining source image is now split in two 
along y, and those affine transformations are now 
recursively analyzed. Splitting the image in this manner 
reduces the effect of the tilt in the shear, and may create 
LEFTQ and RIGHTQ regions in the sub-images which 
could not be formed in the originals. In Figure 9, for 
example, there are no LEFTQ or RIGHTQ regions 
originally, but subdivision creates 4 such regions. 


Ieft()/right() pixels are shaded 



Figure 9: Creating LEFTO/RIGHTO by subdivision. 

11. The fourth level of case analysis 

It is possible that the shear fits into none of the above 
categories. This happens when the source image is being 
very slightly scaled-up and there is little or no shift. 

In this case, we have no recourse but to buffer the affine 
transformation with saves/restores of selected areas of 
image memory, "stitching” the borders between the 
panels. Specifically, we now need two buffers B1 and B2, 
of/pixels each. On each scanline: 

1) Find the fixed point F. the pixel such that src(F,y) = 
F. 

2) Read the pixels from [F-f..F] into Bl. These pixels will 
be written when the left half of the scanline is processed. 
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but their original values are still needed to process the 
right half. 

3) call the Helper algorithm to evaluate the scanline from 
[0..F], left-to-righL 

4) Read the pixels from [F-fi.Fj into B2, and then write 
them with Bl. 

5) call the Helper algorithm to evaluate the scanline from 
F onwards, right-to-left. 

6) Write the pixels from[F-f..F] from B2. 

This algorithm can be performed in all cases, but it is very 
slow: it calls the Helper algorithm twice per scanline, and 
also must perform 4 buffer I/O operations. If, however, 
the algorithm is to replace the Catmull-Smith code, then 
these objections no longer hold, and the algorithm above 
yields a slower but more general replacement. 

12. Summary 

An algorithm has been presented to perform in-place 
affine transformations in constant space. The algorithm 
subdivides the transformation into a series of smaller 
transformations. Each smaller transformation is then 
performed using the Catmull-Smith algorithm. In this 
way, the new algorithm provides an additional level of 
capability to a graphics software library, which is 
particularly appropriate in environments where image 
sizes are huge and/or processor memory size is limited. 
The algorithm works by case analysis, chipping away at 
the problem by gradually imposing slower and more 
general algorithms on more difficult portions of the affine 
transformation: Figure 10 provides a summary. 

A summary of the algorithm: 

/• split the source into between 

• one and three sub-regions. 

• calls to ’Helper' are of the form 

• Helper(source,dest, 

• eval order, special notes); 

V 

dir2 = (offset <= 0.0) ?LtoR:RtoL; 
foreach subregion S do 
switch (type) { 
case LEFT: 

/• section 8.1 */ 

Helper(S.S.LtoR); 
break; 
case RIGHT: 

/• section 8.2 */ 

Helper(S,S,RtoL); 
break; 

case NEITHER: 

/• section 8.3 */ 

Helper(s,s,LtoR,scale); 

Helper(s,s,dir2, non-scale); 
break; 


case BOTH: 

if small-source or small-dest { 

/* see figures 4,5. and 6, 
sections 9.1 - 9.3 •/ 

Copy(needed-part(S), 

1 eft-part(S).left-aligned); 
dirl = (scale <= 1.0)?LtoR:RtoL; 
Helper(left-part(S),S, 
dirl.seale); 

Helper(S,S,dir2,non-scale); 

} else { 

/* section 10, figure 8 ♦/ 

If LEFT region exists { 
Helper(S.LEFT,LtoR); 

S a pruned-part(S); 

> 

if RIGHT region exists { 
Helper(S,RIGHT.LtoR); 

S = pruned-part(S); 

} 

/• figure 9 •/ 
if (height(S) > 1) { 

Recurse(top-half(S)); 
Recurse(bottom-half(S)); 

} else { 

/• section 11 •/ 
stitch 

> 

} 

break; 

} 


Figure 10: Summary of the Algorithm 


13. Extensions and modifications 


The algorithm may be extended to handle cases when the 
source image is a proper subset of the destination image. 
That case has not been discussed here for presentation 
purposes. For a discussion of that extension and other 
implementation issues, the interested reader is referred to 
[Fishkin89]. 
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Abstract 

Although technologies such as media spaces have been 
designed to facilitate collaborative work at a distance, the 
existing systems have primarily focused on the use of 
computer managed audio and video as mechanisms to 
support meetings and video phone calls. Research has 
shown that frequent and spontaneous informal 
communication is crucial for project coordination and work 
progress. It has also found that the amount of 
collaboration that occurs varies directly with the proximity 
of co-workers. However, proximity is not always possible 
or desirable in today’s work world. In this paper, we 
introduce the concept of a virtual open office, a simulated 
shared open office environment which creates proximity 
without its inherent disadvantages. We suggest that a large 
amount of communication among co-workers is not from 
actual intentional communication contact but from 
opportunistic contact and environmental scanning in 
which each individual is picking up valuable coordination 
information. We propose that it is this aspect of the 
constant contact of an open office environment which 
provides the closeness and cohesion necessary for effective 
work coordination. Based on this premise, we argue for a 
set of unique user requirements for the virtual open office 
and demonstrate an instantiation of these requirements in a 
working prototype, called VOODOO. 

Keywords: Groupware, computer supported cooperative 
work, desktop videoconferencing, informal 
interaction, virtual spaces. 

Introduction 

Collaboration occurs frequently in both academic and 
business environments. Furthermore, it is often 
impossible to have all collaborators working at the same 
physical location. This suggests that we need a way to 
support effective collaboration at a distance. New 
computer and communication technology make it no 
longer necessary to assemble all collaborators at the same 
place, but studies have shown that physical proximity is 


1 Also with the Faculty of Library and Information Science 


important for informal communication[10]. It has also 
been found that better coordination and facilitation are 
supported by informal communication[5] and that fifty 
percent of informal work communication is 
opportunistic^], e.g., triggered by the sight of another 
person. Research has also demonstrated the importance of 
being able to visually identify an opportunity for 
communication^]. 

A large number of media space projects[2, 4, 6, 17] have 
focused on establishing computer managed audio and video 
connections to enhance collaboration at a distance. Part of 
their purpose is to bring back the informal communication 
that has been lost, but a major portion is also to emulate 
the rich communication environment of face-to-face 
contact. It has not always been stated explicitly that the 
audio and video connections imply a meeting, either 
prearranged or serendipitous, but the designs of the 
systems and the discussion of their usage imply that the 
underlying technology structure is primarily for 
establishing an intentional contact, what we define in this 
paper as a meeting. We are taking a different approach in 
our research. The environment we want to create with the 
multi-media tools is not one of supporting meetings but 
one of supporting constant and continuous contact among 
co-workers. We want to simulate a shared office where the 
dwellers of the shared office space are miles apart or a 
simple corridor away. We do not suggest that meeting 
support by media spaces is inappropriate, but rather, that it 
is insufficient for the type and amount of communication 
needed in complex detailed work assignments. 

Even when co-workers are not conversing in a shared work 
environment, they are constantly transmitting details 
about the joint work. Proximal co-workers, for example, 
can overhear relevant conversations, view levels of partner 
progress, perceive changes in project direction, note co- 
worker’s skill advantages and disadvantages, etc. on a real 
time basis. Furthermore, they can instantly corroborate 
the acceptableness of any path changes they might make in 
the joint work. Unfortunately, the large number of 
advantages of a shared office which promotes this 
information exchange have been overshadowed by the 
large number of disadvantages of placing employees in the 
same room or a weakly partitioned room. We postulate that 
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if media spaces are configured correctly, we can gain back 
the advantages of the shared office without the incipient 
disadvantages, e.g., noise and interruptions. Furthermore, 
we propose that creating such a virtually shared office does 
not provide a primary benefit of travel cost reduction, but 
one of closeness and cohesion of co-workers engaged in 
joint work. Thus, media spaces are not just for enhancing 
communication at a distance but also for supporting 
communication within the same building and even on the 
same floor. 

Several existing media space systems incorporate some 
aspect of supporting informal communication within their 
structure. Bellcore has built Cruiser[17], a prototype 
desktop browsing tool which enables unplanned, informal 
social interaction via audio and video links between co- 
workers. Other example media space interfaces that have 
some elements of the virtual office are Polyscope and 
Vrooms[2] at EuroPARC. Polyscope is a system which 
distributes digitized images of workers within a building to 
provide awareness of the other person’s presence. It also 
acts as an interface to the audio and video network so that 
co-workers can make actual full motion audio and video 
connections. Vrooms is a modified Polyscope system that 
addresses some of the social and interface issues found in 
Polyscope. It employs a stronger spatial metaphor so that 
people can establish or terminate conversations by 
entering or leaving a virtual room. 

Because it has been found that visual accessibility can be 
intrusive at times, designers of Cruiser and Vrooms have 
implemented controls on excessive visual accessibility 
using rudimentary techniques such as bars (what they call 
video blinds) crossing the video images or still shots of 
the co-workers taken at 15 minutes intervals. For Vrooms, 
Borning and Travers[2] used small video images of co¬ 
workers to limit the intrusion of a constantly open video 
channel. At EuroPARC, the RAVE[6] media space uses 
user-tailorable buttons to make a variety of audio and video 
connections to co-workers. One of these buttons is an 
“office share” button which puts two co-workers in 
constant continuous full motion video contact similar to 
the virtual office space we are proposing. 

Although the above systems partially support informal 
interactions at a distance and awareness of co-workers, the 
approaches taken are ones of simply maintaining the video 
contact of the media space for a long period of time or of 
creating a general virtual meeting area where serendipitous 
contact can occur. This paper extends the continuous 
contact concept and discusses what other system 
operations need to be in place to effectively support 
continuous contact. We introduce the concept of a virtual 
open office — an open office in which physically 
separated co-workers are in constant contact through open 
communication channels. We believe that such a virtual 
open office, although not suitable for all forms of office 
work, will be useful for detailed technical collaboration, 
e.g.. joint programming. While a media space system is an 
infrastructure for facilitating collaborative work, a virtual 
open office is a software environment that is configured 
within the media space system to satisfy its unique set of 
user requirements. In the following section, we focus on 


the user requirements that are appropriate for the virtual 
open office environment. We then describe our 
instantiation of these requirements in a system we call 
VOODOO. 

User Requirements 

At the University of Toronto, we have built a media space 
called CAVECAT (Computer Audio Video Enhanced 
Collaboration And Telepresence)[12], Whereas CAVECAT 
is designed for making and breaking video and audio 
connections with one or more people, the virtual open 
office is set up to maintain connections with one or more 
people continuously throughout the workday. This means 
that co-workers residing in a virtual open office must be 
accessible for standard CAVECAT video calls just as they 
would be open to people walking into their shared office. 
It also means that such calls are made to all members of the 
open office not just to a single co-worker. Of course, 
private conversations can ensue just as they might in an 
open office, but all co-workers would be aware that such 
private conversations were taking place. Thus by creating 
a virtual open office in a media space, we add all nature of 
additional constraints on how that space is to be managed. 
We have combined existing research on open office 
communication behavior with experimental observations 
in our laboratory to generate a list of user requirements for 
the virtual open office. Although this list is not 
exhaustive, we have attempted to specify that set of 
requirements which preserves the advantages of an open 
office and eliminates its disadvantages. Table 1 lists the 
entire set of user requirements. 



User Requirements 

1 

Ability to implicitly establish a co-worker’s level of 
accessibility 

2 

Ability to enforce reciprocity in information 
exchange 

3 

Ability to explicitly set one’s level of accessibility 

4 

Ability to change one’s position with respect to co¬ 
workers 

5 

Ability to trivially make verbal and visual contact 

6 

Ability to trivially close verbal and visual contact 

7 

Ability to have multi-way conversations 

8 

Ability to support multi-media information exchange 

9 

Ability to filter out unwanted noise 

10 

Ability to discriminate among sounds in the virtual 
open office 

11 

Ability to obtain feedback on the communication 
environment 


Table 1: Virtual open office user requirements 


Accessibility 

In a normal open office environment, co-workers are in 
constant contact, and thus, they are always available for 
interaction. In an open office, it is not availability, but 
co-workers’ accessibility that is important, e.g., whether 
the co-worker is at an interruption point in their work or 
conversation. 
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Requirement 1: Ability to implicitly establish a co¬ 
worker’s level of accessibility 
Kraut, Egido and GalegherflO] and Allen[l] have postulated 
that physical proximity is crucial for informal interaction. 
Our survey of users of the CAVECAT media space has found 
that users do not make video connections for fear of 
intruding on the other party[4]. Root[17] has pointed out 
that people use implicit interaction protocols to indicate a 
willingness to receive a communication contact. After 
noticing that someone is in their office, people use cues 
such as the type of work a potential contact is engaged in 
to ascertain the occupant’s accessibility for interaction. 
For example, in a conventional open office, co-workers 
implicitly know not to interrupt a person talking on the 
telephone. Material placed in one's own workspace as 
opposed to a more common area such as a book shelf 
implicitly determines its viewability and thus, 
accessibility to others. The virtual open office 
environment should provide similar mechanisms for users 
to determine the accessibility of co-workers. Because this 
accessibility is established by spatial arrangements and 
events that are the normal course of work, this same 
implicitness needs to be available in the virtual system. 

Requirement 2: Ability to enforce reciprocity in 
information exchange 

When people used EuroPARC’s media space, Polyscope, 
video symmetry was almost never requested, that is, users 
did not ask to see who was looking at them. Users may not 
be aware of the unequal information exchanges supported 
by the system. In an open office, viewing is reciprocal. If 
I can see someone I know that person can see me. In a 
media space, this is not necessarily true. Furthermore, 
although I can see someone, they might be able to see more 
of me and at a much finer level of detail. This unequal 
exchange can cause severe imbalances in relations and 
exchanges. It is therefore necessary to explicitly enforce 
reciprocity in all information sharings in a virtual open 
office if natural coordination relations are to be 
maintained. 

Requirements: Ability to explicitly set one’s level of 
accessibility 

Chatting with co-workers is often a hindrance (albeit 
enjoyable) to work in an open office. So is maintaining 
document privacy. Therefore, control over conversational 
and workspace accessibility is an essential need for the 
virtual open office. 

Requirement 4: Ability to change one’s position with 
respect to co-workers 

Marmolin, Ahlstrom and Ropa[14] have found that people 
use a large video image for discussion, but when they are 
working intensely on their own tasks, they use a small 
video image for checking the communication status of their 
co-workers. Our own laboratory studies of two individuals 
working on a joint programming task over the media space 
showed that people did not use the visual image of their 
partner when they were focussing on the task. They only 
glanced at the video occasionally and sat away from the 
video screen. When they were negotiating a detail about 
the task, they moved their chairs directly in front of the 
video image and engaged in a more direct face-to-screen 


contact. In an open office, people often learn a large 
amount about their co-workers by glancing around their co- 
workers’ office or looking at what’s on their co-workers’ 
desk or book shelf. By walking over to the other’s desk, 
people are able to get a closer look. A virtual open office 
environment should be capable of handling the close 
contact as well as the environmental scanning which 
people use in their daily work to gain information about 
their colleagues. 

Communication Cost 

There are financial and behavioral costs associated with 
establishing communication. The behavioral cost aspect 
is more important for informal interaction at a distance. 
Different communication media have been shown to affect 
the collaboration process. The more limited the 
communication medium is, the less effective the 
collaboration process. If the behavioral cost is high, such 
as remembering and pushing several digits on a telephone, 
waiting for an answer and establishing a communication, a 
mental cost/benefit tradeoff will be calculated and 
communication below a particular threshold will not take 
place — even if the communication would have transmitted 
important information. Allenfl] has shown exponential 
drop-offs in the frequency of communication between co- 
workers as the physical distance increases. In their study 
of communication. Kraut et al.[ll] found that 52% of the 
conversations involved people located off the same 
corridor and 87% of the conversations took place among 
people who shared the same floor of a building. We believe 
that it is the cost of making informal contacts when 
distance increases that causes their significant fall off. 

Requirements: Ability to trivially make verbal and 
visual contact 

In the absence of close proximity, we need to make the 
initiation cost of communication very low to encourage 
frequent and spontaneous communication. In the joint 
programming studies we conducted in our simulated virtual 
open office environment, we found that conversations 
opened and closed without any formal protocols just as 
they might in regular exchanges. We noted that no 
conversation commencement and termination protocols 
occurred even when there existed gaps of five to ten 
minutes between verbalizations. We assume a state of 
communication that no longer needs formal contact 
protocols, much like that of a continuous conversation 
with pauses. Contact should be as easy as starting to talk 
or raising one’s head to gain other's attention. 

Requirement 6: Ability to trivially close verbal and 
visual contact 

It is equally important to have a low behavioral cost for 
terminating communication. Co-workers should not need 
to follow a formal closing protocol. After all, the 
conversation is assumed to be continuous in this state, 
only punctuated by pauses. Our studies of joint 
programming in our simulated virtual open office support 
this behavior as well. 

Requirement 7: Ability to have multi-way conversations 
It should be easy to have a third party join or leave a 
conversation. People should be able to make entrances 
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into a virtual open office conversation similar to the way 
they normally would in a physical open office. However, 
entrants into a physical open office often walk up to one 
person’s desk and engage in a private conversation. Such 
activities should also be supported in the virtual open 
office including the one in which the two conversants leave 
the open office for an even more private exchange. As in a 
regular open office, concurrent conversations of several 
subgroups of co-workers should be permitted to occur with 
listening support for other co-workers. However, such 
listening support should not permit others to overhear 
private verbal exchanges. 

Information Sharing 

Information sharing adds semantic content to a 
conversation and provides an underlying context for the 
discussion[ll]. Tang[19] has demonstrated that shared 
drawing space is not only useful for storing information 
and conveying ideas, but also for developing ideas and 
mediating interactions. Ishii[8] has shown the problems 
of work integration in shared computer environments and 
presented a video solution for seamless shared workspaces. 
Lauwers and Lantz[13] have suggested a set of user 
requirements for shared window systems to support sharing 
using existing collaboration transparent applications. 
Although document sharing is important, so is document 
privacy. Oldham and Rotchford[15] have shown that in an 
open office, co-workers preserve their ownership of a 
workspace by placing personal spatial markers around that 
space. There is the need for both defining and preserving 
the ownership of people’s workspace when that space 
becomes virtual. Thus there is a need for both workspace 
sharing in a virtual open office but also one of allowing a 
user to establish limits on this sharing. 

Requirements: Ability to support multi-media 
information exchange 

In our shared programming study, we emulated a virtual 
open office by setting up two offices back-to-back and 
passing cables for exchanging computer screen images and 
camera images to monitors on the other side of the wall. 
We used Y-cables to split each computer’s video output and 
send it to a second screen on the other side of the wall. 
Thus, each programmer had a view of what activities and 
code the other programmer was working on as well as video 
and audio connections to their co-worker. We found that an 
extensive amount of time was spent looking at the other 
person’s code and hand copying it from one terminal to 
another. Each person also pointed to elements in the other 
person’s code but this action was not visible by the second 
person in our rudimentary setup. From these exploratory 
studies we ascertained that the virtual open office 
environment should have all shared work information 
available to all parties in the office. Co-workers should be 
able to easily share objects such as drawings and text[3]. 
Telepointing facilities should be provided for people to 
easily refer to objects in the shared workspace. The system 
should be able to handle off-line as well as on-line material 
and to allow synchronous annotation of screen objects. 
Co-workers should be able to easily demonstrate processes. 
For example, a person may show the execution of a 
program which contains bugs and ask the other person how 
to correct the program. Co-workers should also be able to 


easily copy relevant screen objects from their co-workers * 
machine. 

Environmental Improvement 

Use of open audio and video channels leads to concerns 
about preserving an individual’s privacy[2]. An open 
office is not private[16, 18], but workers are always aware 
of this lack of privacy. A virtual open office is 
problematic because the cues to indicate a privacy problem 
may no longer be available. Privacy issues in a virtual 
open office are very different from that in a media space 
system. A virtual open office does not provide co-workers 
with total privacy, thus it is not a suitable environment for 
people who want to work privately. Co-workers in a 
virtual open office face the problem of distractions such as 
noise. Noise has been a common complaint of workers in 
open offices[18]. Ambient noise in the office and noise 
generated from co-workers’ chatting and typing forces 
people to make an extra effort to concentrate on their work. 
One of the problems with an open office is that there is so 
much noise that it is very difficult to discriminate between 
the sounds we want to hear and those we want to shut out. 
The inability for participants of a media space to localize 
sound makes it more problematic because the location cues 
used for sound filtering are lost. During CAVECAT 
sessions, on many occasions several people would answer 
their phone when the phone rang in another office. Users 
of our CAVECAT system have expressed the desire for 
system generated cues to help them spatially separate 
sound sources[4]. 

Requirement 9: Ability to filter out unwanted noise 
Limited options exist when a co-worker is very noisy in a 
physical open office but the virtual open office makes it 
simple to quiet a noisy inhabitant just by lowering a 
volume control. Thus, one should be able to muffle out the 
noise generated by a co-worker. This silencing capability 
needs to be reciprocal, i.e., users should be able to prevent 
a conversation from reaching others as well as inhibiting 
conversations from disturbing them. 

Requirement 10: Ability to discriminate among sounds in 
the virtual open office 

Through the use of technology, e.g., a three dimensional 
sound system or different phone rings for different people, 
the system should provide implicit cues to people for 
spatial separation of sound sources and identification of 
salient signals. 

System Status 

Awareness of both the physical and social environment is 
required for maintaining informal activities in a virtual 
open office. Information and feedback should be provided 
so that collaborators have a constant overall picture of the 
work environment. 

Requirement 11: Ability to provide feedback on the 
environment 

People may have set up their phone to be accessible so that 
they can receive phone calls, but the receiver may have 
been misplaced in its handset so that it is still off the 
hook. The system needs to give clear feedback on the 
settings of each worker’s personal accessibility settings at 
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all times. The environment should be able to signal users who is talking to whom in the office. A virtual open office 
of inconsistencies in their desired and actual settings and of should provide appropriate feedback to its dwellers so that 

temporary (non default) settings they have selected that are they are fully aware of co-workers ' communication and 

still in place. In an open office, one has complete accessibility status, 
awareness of the office environment and can always tell 

r 4 Virtual Office Settings Tools 17:49:06 £D <& 



Figure 1: Three co-workers use VOODOO to work in a virtual open office 

The Design of VOODOO change the view to engage in a more intimate interaction 

with co-workers. 

We have built an instantiation of the virtual open office at 

the University of Toronto. We call it VOODOO. It works A conversation is started by moving the mouse cursor to 
as follows: Teams of co-workers are assigned to one virtual picture of the co-worker and clicking on the mouse 

open office that is their permanent office, i.e., the button. Eye contact or verbal hailing catches the co¬ 
equivalent to their physical office in the real world. worker's attention. A conversation is closed by a mouse 

Whenever co-workers log into the media space system, click to toggle to an audio off state. A person can join an 

they are, by default, put into their permanent virtual open on-going conversation by moving the mouse cursor to the 

office. They may be the only occupant, in which case, they picture of one of the participants and clicking the mouse 

have a virtual private office. On the worker's computer button. Several subgroup discussions can happen at the 

screen are small, faraway shots of the co-workers who are same time in the virtual open office without interference, 

in the office. The worker is visually aware of co-workers' 

activities yet is not disturbed by their typing because the Screens of current work can be shared between the 

typing sound has been filtered out. The worker knows that conversing parties and users can mutually point to a topic 

viewing of others is reciprocal and that preventing of interest on these shared screens. An occupant can 

someone from viewing oneself is done only by temporarily leave the virtual open office, e.g., go to lunch, 

relinquishing the privilege of seeing the other person. or can permanently leave, e.g., move to another office 

Different views of co-workers can be obtained, one faraway space or simply leave the project. Users can freely walk 

and one closeup. The default view is faraway. One can bito any existing virtual open office to which public 

entrance has been permitted, but are restricted to be in one 
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virtual open office at a time. Figure 1 illustrates what the 
screen of a user might look like when the user is in a virtual 
open office called cave with two co-workers. Each of the 
windows represents a person that is present in the office. 
At the base of each window are a button labelled s for 
adjusting communication states between each person, e.g., 
changing the view of that person’s office, and a panel for 
displaying the current communication state. A discussion 
is going on between Mantei and JinLi. The other office 
member Buxton is working at his desk and not part of the 
conversation. When an audio connection is made between 


Mantei and JinLi, unless explicitly set to be different, 
video images of both users will fade into close-up views 
during their discussion. The white border around the image 
indicates the audio is on, while the black border indicates 
audio is off. After the conversation, their images will 
again fade into their original state, most likely a faraway 
shot. In this example, JinLi is the owner of this screen, 
that is, the picture of this screen is viewed from JinLi’s 
workspace. 



User Requirement 

Design Solution 

1 

Ability to implicitly establish a co-worker’s level of 
accessibility 

Constant open video communication channel 

m 

Ability to enforce reciprocity in information exchange 

Enforced audio, video and computer screen symmetry 

m 

Ability to explicitly set one’s level of accessibility 

Explicit software settings available 

p 

Ability to change one's position with respect to co- 
workers 

Close-up and fly on the wall cameras 

m 

Ability to trivially make verbal and visual contact 

Open video channel and mouse click on image 

6 

Ability to trivially make verbal and visual contact 

Open video channel and mouse click on image 

7 

Ability to have multi-way and concurrent conversations 

Software to explicitly select participants for a 
conversation 

8 

Ability to support high quality information exchange 

Shared computer screens, telepointing, workspace 
viewing and (not implemented)document tray metaphor 

m 

Ability to filter out unwanted noise 

Muffler metaphor(partially implemented) 

10 

Ability to discriminate among sounds in the virtual open 
office 

3D sound system with MIDI(not implemented) 

m 

Ability to provide feedback on the environment 

Diagrams of connections(not implemented) 


Table 2: Design solutions to the user requirements 


VOODOO is implemented on a Macintosh computer using a 
client-server architecture. The server resides on a 
S PARCStation and keeps an updated database of resources, 
connections and activities in the virtual open office. Table 
2 illustrates the proposed solutions to the user 
requirements. We discuss the features of VOODOO in the 
following sections. 
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Figure 2: Enter a virtual open office 


Figure 2 shows how a user enters a virtual open office by 
selecting the appropriate office from the virtual offices 
menu displayed at the top of the Macintosh screen. The 
menu dynamically displays all the existing virtual offices 
that are accessible to users and shows who are currently 
working and where they are physically located in the 
respective virtual offices. Once a user has entered a virtual 
office, all video images of co-workers in the office appear 
as small windows on the user’s screen. Via the open video 
channel, the user can easily tell who is currently working 
in the office and implicitly establish co-workers’ 


accessibility. This satisfies user Requirement 1. VOODOO 
uses full motion but low spatial resolution video for the 
faraway shot. Full motion video provides users with real 
time visual awareness of events, while low quality video 
preserves co-workers ’ privacy and prevents excessive 
visual intrusion. When co-workers are engaged in a 
conversation to negotiate details about a task, full motion 
closeup video is used to enhance information exchanges. 
Since co-workers can see each other when they are visually 
accessible, this meets the reciprocity constraint of 
Requirement 2. Associated with each individual image on 
the screen is a set of communication and workspace 
attributes that specify a person’s accessibility. They can 
be explicitly set. For example, a user can turn off the audio 
coming from a colleague who is unusually noisy. This 
capability satisfies Requirement 3. 

VOODOO uses several video cameras to provide users with 
different views of their co-workers so that the complete 
office scene of a co-worker can be viewed and a richer 
information exchange can occur. The closeup camera is 
placed on top of the computer screen and captures a head 
and shoulders image of a co-worker. The “fly on the wall’’ 
camera is mounted on the office wall and captures the 
interior of the office. Figure 3 shows how a user can move 
closer to another co-worker for an intimate interaction or 
further away to get a broader view of the office by holding 
down the mouse button on a video image to get a popup 
list. This setup fulfills Requirement 4. 
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IIIIIggHBfflliPM man 


tel^caue B£QI 


Full audio and Farauiay uleui 
No audio and Farauiay uleui 
Full audio and Closeup uleui 


VMuffled audio and Farauiay uleui 
Full audio and No uideo 
No audio and No uideo 
* Priuate Side Comment 


Figure 3: Control communication states 


To initiate verbal contact, users need to use the mouse to 
click on the video image of the co-worker they wish to 
communicate with. Although the action must be done 
explicitly, it has the explicitness and ease of use that are 
intrinsic to the social protocol of calling out a co-worker’s 
name in a face-to-face contact. Another approach that is 
being considered is the usage of sensors to detect head 
movement. As a user raises his or her head and starts 
talking, the head direction selects the contact person and a 
voice activated audio connection is made. Closing a 
conversation is also done by a mouse click. These 
interface features meet the requirements of low behavioral 
cost for making and breaking communication 
(Requirements 5 and 6). The current VOODOO system does 
not have the voice activated audio feature, but we are 
working to set up the appropriate hardware and software to 
incorporate it in the next version. 


Users can selectively decide who is not supposed to 
overhear the current conversation with menu selections. 
Alternatively, a user just needs to mouse click on the 
picture of one of the participants of a conversation to 
smoothly join the meeting. This interface design meets 
Requirement 7. 


We propose a document tray metaphor for users to share 
documents while preserving ownership of workspace. 
When a user places a document in a co-worker’s document 
tray, that co-worker immediately gains viewing access to 
the document. However, should the user decide that the co- 
worker no longer has access to the document, the user can 
remove the document from the co-worker’s tray. Even if 
the co-worker is reading the document, when the user takes 
the document, the co-worker loses access to the document. 
A telepointer is available in a unique color for each user. 
This allows each co-worker sharing a document to point to 
important information they are discussing. Their 
telepointers can be seen on all other screens displaying the 
document. Documents can be all possible documents 
generated by Macintosh applications. We have provided 
users with an application tools chest (Figure 4). Users can 
run other collaborative or single-user Macintosh 
applications within the VOODOO system to perform 
different tasks with their co-workers. Through the use of 
the document tray metaphor, telepointers and shared 
computer screens, a high bandwidth of information can be 
easily exchanged and thus. Requirement 8 is fulfilled. The 
document tray metaphor has not been incorporated into our 
initial version of VOODOO. 


The muffler metaphor is used to circumvent the noise 
problem in an office. Users have the option to muffle the 
audio signals at several different levels. They can muffle 
noise such as a co-worker’s typing sound but permit normal 
human voice to be transmitted across the audio connection. 
The level of muffling is controlled by a slider interface. A 
user sets the level by adjusting the slider bar. When a user 
is disturbed by the typing sound of the co-workers, after 
adjusting the muffling level, the typing sound will fade to 
the background and be filtered out completely. This 
feature, which satisfies Requirement 9, has not been 
implemented in the current system prototype. 



Figure 4: Run applications within VOODOO 


One of the problems with current media space is the lack of 
proper audio feedback for users to spatially separate their 
co-workers. Three dimensional sound hardware and 
software systems can reproduce audio while preserving its 
spatial properties. This permits a user to recognize where 
the sound is coming from. If such a system were 
implemented in our interface, Requirement 10 could be met. 

A diagram of current communication connections is 
provided to co-workers to give feedback on who is talking 
to whom in the virtual open office so that co-workers are 
constantly aware of communication activity in their 
environment. For example, when a full audio connection is 
established between two co-workers, the diagram could 
show cables connecting one co-worker’s microphone 
connected to the other’s speaker. This would then meet 
Requirement 11. 


Conclusion 

A unique set of user requirements for the virtual open office 
concept has been presented. VOODOO, a working 
prototype of the virtual open office concept, provides an 
environment for collaborators so that easy and seamless 
informal interactions can be achieved and so that co- 
workers are constantly aware of their colleagues’ activities 
without problematic interruptions and noisy 
environments. Further detailed user testing and field study 
are required to evaluate the usefulness of the virtual open 
office. 


Acknowledgements 

For research support, the authors are indebted to the Natural 
Sciences and Engineering Research Council of Canada 
(NSERC), the Information Technology Research Centre of 
Ontario (ITRC), Apple Computer Inc., Digital Equipment 


Graphics Interface ’92 








122 


Corporation, IBM Canada’s Laboratory Centre for 

Advanced Studies in Toronto and Rank Xerox EuroPARC. 

References 

1. Allen, T. J. (1977) Managing the flow of technology. 
Cambridge, MA: MTT Press. 

2. Boming, A. and Travers, M. (1991) Two approaches 
to casual interaction over computer and video 
networks. Proceedings of CHI*91 Conference on 
Human Factors in Computing Systems, New Orleans, 
LA, April, 1991, 13-19. New York: ACM Press. 

3. Buxton, W. A. (1991) Telepresence: Integrating 
shared task and personal spaces. Submitted to 
Groupware* 91. 

4. Buxton, W. A. and Sellen, A. J. (1991) Interfaces for 
multiparty video conferences. In Proceedings of 
CHI*92 Conference on Human Factors in Computing 
Systems. Monterey, CA, May, 1992. 

5. Daft, R. L. and Lengel, R. H. (1986). Organizational 
information requirements, media richness, and 
structural design. Management Science , vol. 32, 554 - 
571. 

6. Gaver, W. W., Moran, T., MacLean, A., Lovstrand, L., 
Dourish, P., Carter, C. A. and Buxton, W. (1991) 
Working together in media space: CSCW research at 
EuroPARC. Proceedings of the Unicom Seminar on 
computer Supported Cooperative Work: The 
Multimedia and Networking Paradigm. London, 
England, July, 1991. 

7. Gullahom, J. T. (1952). Distance and friendship as 
factors in the gross interaction matrix, Sociometry , 
vol. 15, 123 - 134. 

8. Ishii, H. and Miyake, N. (1991) TeamWorkStation: 
Towards an open shared workspace. Communication 
of the ACM. 

9. Kendon, A. and Ferber, A. (1973). A description of 
some human greetings. In Michael, R. and Crook, J. 
(Eds.) Comparative Ecology and Behavior of Primates, 
591 - 668. London: Academic Press. 

10. Kraut, R., Egido, C. and Galegher, J. (1990). Patterns 
of contact and communication in scientific research 
collaboration. In J. Galegher, R. Kraut and C. Egido 
(Eds.) Intellectual Teamwork: Social and 
Technological Foundations of Cooperative Work, 
327-350. Hillsdale, NJ: Lawrence Erlbaum 
Associates, Inc. 

11. Kraut, R. E., Fish, R. S., Root, R. W., and Chalfonte, 
B. L. (1990). Informal communication in 
organizations: Form, function, and technology. In 
Oskamp, S. and Spacpan, S. (Eds.) People's Reactions 
to Technology, 145 - 199. Beverly Hills: Sage 
Publications. 

12. Mantei, M.M., Baecker, R.M., Sellen, A.J., Buxton, 
W.A., Milligan, T. and Wellman, B. (1991) 
Experience in the user of a media space. Proceedings 
of CHI* 91 Conference on Human Factors in 
Computing Systems, New Orleans, LA, April, 1991, 
203-208. New York: ACM Press. 

13. Lauwers, J. C. and Lantz, K. A. (1990) Collaboration 
awareness in support of collaboration transparency: 
Requirements for the next generation of shared window 


systems. Proceedings of CHI*90 Conference on 
Human Factors in Computing Systems, Seattle, WA, 
April, 1990, 303-311. New York: ACM Press. 

14. Marmolin, H., Ahlstrom, B. and Ropa, A. (1991) A 
laboratory case study of a co-working task using 
picturephone, shared screens and speech 
communication. Unpublished report. Laboratory of 
Human computer Interaction, Linkoping, Sweden. 

15. Oldham, G. and Rotchford, N. (1983) Relationships 
between office characteristics and employee reactions: 
a study of the physical environment. Administrative 
Science Quarterly , vol. 28, December, 1983, 542-556. 

16. Oldham, G. (1988) Effects of changes in workspace 
partitions and spatial density on employee reactions: 
a quasi-experiment. Journal of Applied Psychology , 
vol. 73, May, 1988, 253-258. 

17. Root, R.W. (1988). Design of a multi-media vehicle 
for social browsing. In Proceedings of CSCW*88 
Conference on Computer-Supported Cooperative 
Work , Portland, OR, September 1989, 25-38. New 
York: ACM Press. 

18. Sundstrom, E. (1986). Work Places: The Psychology 
of the Physical Environment in Offices and Factories. 
Cambridge: Cambridge University Press. 

19. Tang, J.C. and Minneman, S.L. (1990). VideoDraw: a 
video interface for collaborative drawing. Proceedings 
of CHI *90 Conference on Human Factors in 
Computing Systems, Seattle, WA, April, 1990, 313- 
320. New York: ACM Press. 


Graphics Interface ’92 



123 


Telepresence: Integrating Shared Task and Person Spaces 


William A. S. Buxton 


Computer Systems research Institute 
University of Toronto 
Toronto, Ontario, Canada MSS 1A4 


Abstract 

From a technological and human perspective, shared space 
in remote collaboration has tended to iocus on shared space 
of either the people or the task. The former would be 
characterized by traditional video/teleconferencing or 
videophones. The latter could be characterized by syn¬ 
chronous computer conferencing or groupware . 

The focus of this presentation is the area where these two 
spaces meet and are integrated into what could be 
characterized as video-enhanced computer conferencing or 
computer-enhanced video conferencing. 

From the behavioural perspective, the interest lies in how - 
in collaborative work - we make transitions between these 
two spaces. For example, in negotiating, the activity is 
mainly in the shared space of the participants themselves, 
where we are "reading" each other for information about 
trust and confidence. On the other hand, in preparing a 
budget using a shared electronic spreadsheet, for example, 
the visual channel is dominated by the task space. 

How well systems affords natural transitions between these 
spaces will have a large impact on their usability, useful¬ 
ness, and acceptance. Consequently, we investigate the de¬ 
sign space and some of the issues affecting it. 

Keywords: Human-computer interaction, CSCW, 

Videoconferencing, Groupware. 


Introduction 

Groups play an important role in our work-a-day life. 
Physical proximity facilitates interaction among group 
members. Even splitting groups across two floors of the 
same building can have a negative effect on group dynam¬ 
ics (Kraut & Egido, 1988), yet in many organizations 
groups are distributed across campuses, cities, countries, or 
even the globe. The health of these organizations is tightly 
coupled to the ability to maintain a sense of "group," de¬ 
spite such distances. Our interest lies in developing telep¬ 
resence technologies appropriate for fostering such main¬ 
tenance. 

As we use the term, telepresence is the use of technology to 
establish a sense of shared presence or shared space among 
geographically separated members of a group. The topic is 
of particular interest now due to the ongoing convergence 
and affordability of the requisite computer, 


telecommunications and audio/video technologies; 
however, if these technologies are going to be deployed in 
anything other than a tail-wagging-the-dog technology- 
driven manner, we must first develop a better 
understanding of what we mean by "shared space" or 
"shared presence" in the context of group interactions. 

In what follows, we begin to investigate what is shared in 
various types of group interactions, and some of the tech¬ 
nological implications of supporting such sharing. Our 
purpose is "consciousness raising" rather than the presenta¬ 
tion of formal theories or models. Our case is made primar¬ 
ily through the use of examples. Our hope is to provide 
some foundation for making better design decisions and 
better exploiting the potential of existing and evolving re¬ 
sources. 


Starting from the Known 

The terms "meeting" or "group interaction" are almost de¬ 
void of information since they encompass such a broad 
range of activities. Each has its own set of properties and 
purposes. Only by understanding these properties can we 
hope to design the appropriate affordances into supporting 
technologies. 

This is nothing new. Take architecture as an example. 
Because it is a mature discipline, we think of it as part of 
the general ecology of work, rather than as a technology. 
Yet a technology it is, and very much a technology to 
support group activities. Consider, then, the different 
types of group activities that are a part of our everyday 
work, and how the affordances of this technology have 
been designed to support them. We clearly understand 
differences of purpose, and choose the space (office, 
lounge, laboratory, board room, gym, lunch room, etc.) 
accordingly. 

Because the technology is mature, we have a good sense of 
how to match the activity to the space. In order to be con¬ 
sidered mature, the electronic meeting spaces of telepres¬ 
ence must meet the same dual criteria of supporting a com¬ 
parably rich range of group activities and doing so in such 
a way that users have the same transparent sense of 
appropriateness of space-to-activity. 

To speak of "videoconferencing" or "telepresence" is anal¬ 
ogous to speaking about "buildings." While having some 
value, the grain of analysis is too course to foster an under- 
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standing of what goes on "inside.” While we would never 
do so with rooms in a building, our current level of 
(im)maturity with electronic spaces has a tendency towards 
"one size fits all." This is something that we must break out 
of. The range of electronic meeting spaces, like the range 
of spaces in a well-designed building, must match the 
richness and range of meeting types. As a start to achieving 
this, we can move from the level of "buildings" to that of 
"rooms" and try to gain some insight into the nature of 
some of the different spaces that we warn to share. 


Person and Task Spaces 

In what follows, we are going to consider presence in terms 
of two spaces: that of the person and that of the task. From 
even such a simple cut, several interesting insights emerge. 

What we call shared person space in telepresence is the col¬ 
lective sense of copresence between/among group partici¬ 
pants.* This includes things like their facial expressions, 
voice, gaze and body language. 

By shared task space we mean a copresence in the domain 
of the task being undertaken. If we were doing a budget, 
for example, this might mean that each of us has the budget 
in front of us in the form of a shared speadsheet. Despite 
the distance, each of us can act upon it to make changes, 
annotations, or just to indicate cells that are the subject of 
discussion. 

Sometimes the person and the task spaces are the same. One 
example would be in negotiations or counseling. Here a 
major part of the task involves "reading" the other person, 
such as to evaluate confidence or trust^. In other cases, 
such as our budget example, person and task spaces are 
more distinct In what follows we shall see that different 
technologies lend themselves to differing degrees in sup¬ 
porting these two spaces. The point that we are leading to 
is that one of the most important attributes of a system is 
the seamlessness of their integration (Ishii & Miyake, 
1991), and how well they match the needs of the activity to 
be supported. 

Video Conferencing and Person Space: Some 
Examples 

Traditional videoconferencing is a fairly good example of 
attempting to establish shared person space. While nobody 
would ever be fooled into thinking that the remote parties 
were actually in the same room, one can at least maintain an 
awareness of who is present and get a general reading of 
their body language, for example. The absence of checks 
like, "Are you still there Marilyn?" that are characteristic of 
telephone conferences is an example of what video con¬ 
tributes to maintaining a sense of personal presence. 


^This is in contrast to ’personal space" which carries the 
connotation of privacy, not sharing. Thanks to Hiroshii 
Ishii for making this point and prompting me to change 
my terminology. 

^ This is sufficiently important that we might well refer to 
these as trustification , rather than communication 
technologies. 


Fig. 1, illustrates one example of how video can be used to 
maintain a sense of personal presence in a four-way meet¬ 
ing. 



Figure 1: A videoconference involving four participants. 


The quality of the shared person space can be improved 
through design, however. Below, we give some examples 
that illustrate the breadth of the available design space. 
While many of these techniques are well known, few have 
found their way into mainstream videoconferencing. If 
establishing a strong sense of person space is important, 
then perhaps current practice needs to be reexamined. 

For example, traditional videoconferencing is typically af¬ 
flicted by an inability to establish eye contact among par¬ 
ticipants. This is because of the discrepancy of the position 
of the image of your eyes on my monitor and the position 
of your effective (surrogate) eyes, the camera, which is 
typically located on top of the monitor. 



Figure 2: The Reciprocal Video Tunnel. Through the 
combination of a mirror and half silvered mirror, there 
appears to be direct eye-to-eye contact. The mirrors 
effectively place the camera right in the line of sight. A 
close approximation to reciprocal eye contact can be ob¬ 
tained if both parties are using such an arrangement 
(from Buxton & Moran, 1990). 

By adopting teleprompter technology from the broadcast 
industry, this problem of eye contact can be largely over- 


Graphics Interface ’92 








125 


come. The technique is shown in Fig. 2, as it was imple¬ 
mented by William Newman at Rank Xerox EuroPARC. 
Two mirrors, one of which is half silvered, are used to 
reflect what is in front of the screen up to the camera, which 
is mounted on top of the monitor. 

The use of such teleprompter-like technology to obtain eye 
contact is not new. It was patented in 1947 (Rosenthal, 
1947), has been studied by Acker & Levitt (1987) and used 
by Newman (as mentioned above), and more recently in a 
novel form in the Clearboard system (Ishii & Kobayashi, 
1992). While it's use is not widespread in videoconferenc¬ 
ing, users report greater comfort and naturalness in face-to- 
face meetings carried out using the technique. 

Portrait painting provides the lead for another approach to 
augmenting the nature of personal presence usmg video. 
Video monitors have what is called a landscape aspect 
ratio (the ratio of the width to the height of a video 
monitor), because of their horizontal orientation. A very 
simple trick is to turn the camera and monitor at both ends 
of a conference onto their sides. The result, illustrated 
graphically in Fig. 3, is a portrait style aspect ratio. 



Landscape 


Portrait 


Figure 3: The effect of switching from Landscape to 
Portrait aspect rations in person-to-person video 
conferences . Note that, all other things being equal, in 
the portrait orientation, the hands and desk-top are 
visible, thereby adding to the ability to use a richer 
vocabulary of body language in the dialogue . 

When the image of a single person is to be transmitted, 
more of that person’s body is visible without changing the 
size or resolution of the face. Consequently, in the exam¬ 
ple, the hands of the participant are visible in the portrait 
version, as would be the desk-top. The design affords access 
to a richer vocabulary of body language. As a prototype 
unit built by colleagues from the University of Ottawa has 
shown, this approach can be particularly effective where 
screen size is constrained, such as with small desk-top units, 
since a larger screen surface is available for a given width of 
package. 

Next, let us consider the case of where we want to have a 
meeting involving the participation of more than two sites. 
At the University of Toronto, we have developed a system 
called Hydra, in which each remote participant is repre¬ 
sented by a video surrogate (Sellen, Buxton & Amott, 
1992; Buxton & Sellen, 1991) . The technique involves 
having a separate camera, monitor and speaker for each re¬ 
mote participant. As we have implemented it, these com- 


3 After the fact, we have become aware that this approach 
was first developed by Fields (1983). 


ponents are housed in very compact desk-top units, as 
shown in Fig. 4. 



Figure 4: A user is seated in front of three Hydra units. 
In the photo, the Hydra units sit on the table in the posi¬ 
tions that would otherwise be occupied by three remote 
participants. Each Hydra unit contains a video monitor, 
camera, and loudspeaker. A single microphone conveys 
audio to the remote participants (From Buxton & Sellen , 
1991). 

Using this arrangement, the notion of person space is pre¬ 
served. Because of this it is potentially much easier to 
maintain awareness of who is visually attending to whom, 
and to take advantage of conversational acts such as head 
turning. The idea behind the design is to take advantage of 
existing skills used in the work-a-day world. For example, 
in comparing this technique to other approaches to 
supporting multiparty conferences (Sellen, 1992), the 
Hydra units were unique in their ability to support parallel 
conversations, which naturally occurred in the face-to-face 
base-line condition. 



Figure 5: Using a projected image to obtain a life-sized 
cross-table presence. Participants are captured using a 
miniature camera on the desk-top, so as to minimize ob¬ 
struction of the projectted image. In our installation, we 
use one of the Hydra units (camera only), illustrated in 
Fig. 4. 
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lion available, rather than some video image. Here is a sit¬ 
uation appropriate for complimenting video conferencing 
with shared synchronous computation. 

Using dial-up telecommunications links, or computer net¬ 
work, there are a number of ways that multiple users in re¬ 
mote sites can work together on a single computer applica¬ 
tion. A number of firms use such software, combined with 
teleconferencing, to provide remote product support. 



Figure 8: Liveboard (Weiser, 1991): by combing large- 
screen interactive displays with advanced networks and 
distributed software, shared "whiteboards" can be 
provided to support brainstorming sessions and other 
collaborative work from remote sites. 

Environments such as the X window system, coupled with 
large interactive displays, such as Xerox PARC’s Liveboard 
(Weiser, 1991) are leading towards technologies to support 
distributed brainstorming sessions that preserve many of 
the properties of same-room sessions based around a white¬ 
board. 

What we see from the examples is that we can use a range of 
techniques to support both shared and person spaces, and 
that being able to do so is important to supporting group 
activity across distances. What we haven’t seen - to this 
point - is very much on how these two types of spaces work 
together, or relate. 


Integrating Shared Task and Person Spaces: Two 
Examples 

Shared ARK (Smith, O'Shea, O’Malley, Scanlon & 
Taylor, 1990) was one of the first studies to be undertaken 
at Rank Xerox's Cambridge EuroPARC (Buxton & Moran, 
1990). It was an investigation of joint problem solving: 
subjects had to determine - through the use of a computer 
simulation - whether one stayed dryer by running or 
walking in the rain. Subjects were in separate rooms. They 
had a high fidelity voice link and a video link, 
implemented using the reciprocal video tunnel shown 
diagrammatically in Fig. 2. 

The simulation was a distributed application presented to 
each user on a networked workstation, and took two people 
to operate. Within the task space, each user was ''visible" by 
way of an identifiable cursor in the form of a hand. The 


relationship of the workstation and video tunnel is shown 
in Fig. 9. Note that the position of the video tunnel is akin 
to having the remote participant sitting right beside you. 
Eye contact can be established by a simple turn of the head, 
and voice contact can be maintained throughout. 



Figure 9: Shared ARK (Smith, O'Shea, O'Malley, 
Scanlon & Taylor , 1990): The shared task space is on 
the computer display on the left . The shared person 
space is via the video tunnel on the left which is an 
implementation by William Newman of the design shown 
in Fig.2. 

As with working on a paper on your desk with someone by 
your side, you couldn't look at your collaborator's face and 
the computer screen at the same time. So one aspect of in¬ 
terest was determining when subjects visually attended to 
the computer display, and when they established eye con¬ 
tact through the video tunnel. A pattern did emerge in 
which eye contact was established especially when they 
were initially negotiating how to proceed and at the end 
when checking results. When actually running the simula¬ 
tion - which was a visually vigilant task - the video tunnel 
was seldom used except for short glances. 

Remember, however, that the video tunnel was not the only 
vehicle for establishing shared person space. While attend¬ 
ing to the computer display, each user's surrogate "hand” 
provided a (limited) visual personal presence through its 
pointing and gesturing capability. This was supplemented 
by the voice channel (and in a later study, Gaver, Smith & 
O'Shea, 1991, nonspeech audio). When visual attention 
was directed at the computer screen, the speech and non¬ 
speech audio established a shared space which was more ef¬ 
fective than the highest fidelity video display. 

While what we have described is an over simplification of 
the experiment, it is adequate to establish that subjects 
moved between task and person spaces as they moved 
through different components of the overall task. What we 
take from this is the observation that some (many or most?) 
complex tasks require a range of channels and modalities of 
communication in order to be effectively supported. The 
reason that Shared ARK was so effective was because the 
methods and overhead in switching contexts (such as from 
computer screen to eye contact) had the same overhead and 
action as is used in analogous work-a-day tasks. That is, 
they were built on existing everyday skills that subjects al- 
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ready possessed, resulting in a natural behaviour. This is 
evident to anyone watching the experimental tapes. 

Videodraw (Tang & Minneman, 1990) and its successor 
Videowhiteboara (Tang & Minneman, 1991), are excellent 
examples of a smooth integration of shared personal pres¬ 
ence m a distributed task space. The systems were con¬ 
cerned with providing tools to support design and brain¬ 
storming activities, such as one would encounter around a 
drawing pad or whiteboard, respectively. 

Videowhiteboard's main power came from its sensitivity to 
the need to support both drawing and the body language 
and gestures that typically accompany design and 
brainstorming at a whiteboard. Consequently, the system 
cleverly enables participants to be visible one another on 
the drawing surface, much like in the face-to-face situation. 
This is illustrated in Fig. 10, which is a frame from a video 
of a work session with the system. 


Summary and Conclusions 

Through the use of examples, we have argued that effective 
telepresence depends on quality sharing of both person and 
task space. Through this, the interaction breaks out of be¬ 
ing like watching TV, into a direct engagement of the par¬ 
ticipants. They meet each other, not the system. 

The integration of these two types of space are important. 
The smoothness of transitions between them is critical. 
Without this, the natural flow of interaction is disrupted. If 
the flow is to be natural, then the overhead and styles of in¬ 
teraction used in everyday face-to-face meetings should set 
the standards and design basis for telepresence technolo¬ 
gies. 



Figure 10: Videowhiteboard (Tang & Minneman, 1991): 
an excellent example of effectively blending shared 
person and task space. The remote participant appears 
as a shadow on the far side of the drawing surface. The 
approach supports a rich vocabulary of physical 
gesture, including the ability to anticipate intended 
actions . 

What we hope the examples have illustrated is that, just as 
m traditional meeting spaces, one size doesn’t fit all. There 
are a range of reasons that people meet and bonds that hold 


groups together. Our technologies must reflect these rea¬ 
sons and bonds, and their richness. Current technologies 
do not excel in this regard. What we hope to have shown is 
that this need not be so. 

The design space, as afforded by available and emerging 
technologies, is far richer than is evident by popular 
practice. Hopefully the examples help show the potential 
and provide some keys to how it can be untapped. 
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Abstract 

Computer-supported cooperative work environments 
change some of the underlying beliefs about solutions that 
have been built for distributed computing. Electronic mail 
and file transfers have worked efficiently and effectively by 
breaking the information to be transferred into packets and 
reassembling the packets at the destination. This is not a 
viable solution for handling shared real time drawing or 
writing. Integrity has been maintained by locking out 
portions of a database until an update is completed. Such 
lockout is not always suitable in a groupware interface. 
Other solutions are necessary for resolving conflicts. 
Client-server architectures have worked well for managing 
distributed processing but replicated architectures, coupled 
with their fragility and synchronization problems are 
needed for groupware products in order to preserve 
acceptable local response time. The addition of multi-media 
to the environment complicates the problem more. Control 
of analog video requires a client-server environment which 
can build in intolerable delays as distances between 
communicating parties increase. This paper approaches 
design criteria for shared software from the human side and 
points out profound architectural problems that need to be 
solved if multi-media cooperative work environments are 
to function effectively. 

Resum£ 

Les environments de taches cooperatives mediatises par 
ordinateur changent certaines des croyances sur les 
solutions qui ont ete mise en oeuvre pour l'informatique 
distribu6e. Le courrier eiectronique et le transfert de fichiers 
ont fonctionn6s avec succks et efficacit6 en s6parant 
l’information en paquets qui sont reassembles a' Tanivee. 
Cette solution n’est pas viable pour traiter la composition 
cooperative de textes ou de dessins en temps reel. 
L'integrite a ete maintenue en bloquant Facets de portions 
de base de donn6es jusqu'k la fin de leurs mise k jour. De tel 
blocages ne toujours conviennent pas pour une interface de 
informatique de groupe. Des solutions nouvelles sont 
necessaires pour r6soudre ces conflits. Les architectures 
client-serveur fonctionnent bien pour g6rer des processus 
distribues. Mais des architectures rdpliques, avec leurs 
fragilitd et leurs problkmes de synchronisation, sont 


necessaires aux produits groupware afin de preserver les 
temps de r6ponse locaux acceptables. L'addition de multi- 
media complique encore le problkme. Le controle de la 
video analogique demande un environement client-serveur 
qui cr6e des d£lais intolerables lorsque les distances entre 
interlocuteurs augmentent. Cet article pr6sente des entires 
de conception de informatique de groupe d’un point de vue 
utilisateur et fait apparaitre des problkmes architecturaux 
profonds qui doivent etre rSsolus si Ton veut permettre aux 
environements coop6ratifs multi-mddia de fonctionner de 
manikre efficace. 

Introduction 

The 1980's were the decade of the personal workstation. 
The 2000*s will be the decade of computer-supported 
cooperative work (CSCW), in which shared windows will 
open on multiple workstations across hallways and even 
across continents (Baecker, 1991). Video images of co- 
workers in both real and delayed time will complement the 
shared windows. Shared work environments and multi¬ 
gigabyte fiber transmissions will be commonplace. Audio 
connections are moving to the airwaves (cellular 
telephones) while more intensive data transfer (video 
conferencing and shared work) is taking over the high 
bandwidth networks (Dertouzos, 1991). As exciting and 
imminent as these changes are, implementing CSCW 
environments requires an essential rethinking of 
workstation hardware, distributed systems, networks and 
data transfer to support the new environments. 

People are real time systems. Much of today's distributed 
processing solutions have been built on the assumptions 
that delays do not matter and that packet switching and 
asynchronous tranfer of information is acceptable 
CTanenbaum, 1992). When the distributed human processor 
is added to the system, these assumptions are no longer 
valid. With individual interfaces, certain tolerances of 
delays and non-synchronicity of events were accepted 
because the work being accomplished was private. In 
cooperative environments, work is performed jointly and 
interaction requires each individual to exhibit a range of 
subtle behaviours for negotiation, teambuilding, 
expressing doubt, etc. What worked before won’t work 
now: the social cost is far too great. 
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The above paragraph characterizes the nature of the 
constraints facing the builders of cooperative work 
environments. The constraints are what I call "deep" 
system problems, i.e., their solutions are based on the very 
architecture and hardware on which the system is built. The 
constraints are also human interface problems, i.e., they 
are brought on by the needs of the human user. 

A common misconception in computer science is that the 
user interface design can be separated from the working part 
of a system. This misconception has led to the 
development of user interface toolkits and user interface 
management systems for supporting the design process. 
Although these tools have been effective in making some 
aspects of designing interfaces easier, they can blind the 
designer from perceiving more functionally based 
difficulties with the interface design. In CSCW, it is clear 
that these tools are not enough and that more fundamental 
changes to the underlying system are necessary if the 
interface is to work. Thus, the basic premise of this paper 
is that CSCW is one human interface area that needs to 
reach deep into the computer science field for support. 
Tools for manipulating screen objects and handling input 
devices no longer suffice. 

Why is the design of cooperative work interfaces so 
challenging? Humans are social animals and have very 
finely tuned skills for handling social contact. If the tool 
they use affects their ability to project themselves or to 
maneuver comfortably in the environment created by the 
tools, the entire tool will be rejected. In contrast to 
individual interfaces, where the very privacy of working at 
the interface protects the person learning the interface from 
ridicule, the public nature of shared interfaces makes users 
extraordinarily sensitive to small problems. 

The sections which follow in this paper take a human- 
centred point of view in examining the design issues in 
building CSCW systems. Human needs are stated and 
explained. What these needs translate to in terms of 
underlying system design is then discussed along with a 
presentation of various solutions that have been tried and 
the problems that have been encountered. A major 
difficulty with constructing an architecture that fulfills one 
human need of a CSCW system is that the architecture then 
violates another need. Often all current design solutions are 
found inadequate and in need of new research to make the 
cooperative system work appropriately. 

A major focus throughout the discussion is the assumption 
that thousands of users will someday be working with the 
system so that scalability, flexibility, robustness, and 
support for workstation heterogeneity are important issues 
(Arango, et al., 1992). Many solutions for today’s 
prototype systems, which manage 15 to 100 connections, 
are not viable for interoffice connections of most major 
corporations or government organizations which may 
involve thousands of simultaneous connections. 

For my discussion I select two representative CSCW 
technologies. I begin with desktop video conferencing 
systems (also called media spaces), which represent those 
technologies designed to allow people to meet visually 
while still remaining a long distance from each other. This 


technology is invariably connected to computing 
technology which supports the video and audio 
connections but also provides some form of shared 
software. The interface needs for supporting shared 
software are treated in the next section. 

There are many aspects of CSCW that I do not discuss. They 
include such items as shared calendars, hypertext systems, 
office coordination systems, video mail, group decision 
support systems, etc. Each of these applications is either 
asynchronous - and therefore does not make the demands on 
the distributed processing that synchronous linkage entails 
- or poses problems that are the same as those arising from 
the desktop video conferencing and shared software. 

This paper is a discussion of problems not solutions, but it 
is one with a hopeful note. It provides a new set of 
constraints that help to focus the design process and 
suggests rethinking the mechanisms that are used to handle 
distributed processing and video network traffic. It 
suggests that solutions can be found and often points to 
potential paths to take especially in the real time systems 
area. 

Desktop Video Conferencing 

Desktop video conferencing involves the interconnection 
of offices which can be relatively far apart (e.g., in 
different cities) by a complete video and audio hookup 
(Watabe, et al., 1990; Crowley, et.al., 1990). It has some 
similarities to a videophone, in which the two parties in a 
long-distance conversation have a visual picture of each 
other in addition to audio. It is also more than a 
videophone because it supports more services, e.g., 
meetings, browsing, shared common areas, etc. Desktop 
video conferencing systems are designed to support the 
ubiquitous visual contact that facilitates collaboration in 
day-to-day communication (Kraut, Egido & Galegher, 
1988). History has demonstrated a very poor market for 
videophone calls and video conferencing (Egido, 1988), 
but current research with media spaces (Goodman & Abel, 
1986; Buxton & Moran, 1990) has indicated effective 
casual communication uses that the visual channel 
supports. Figure 1 illustrates a typical configuration of the 
type of network infrastructure that supports desktop video 
conferencing. 

Today's desktop video conferencing architectures assume 
different channels of communication for the different 
media. The video is shipped via coaxial cable or fiber and 
the audio is sent either by normal telephone connections or 
by separately mixed audio. The computer controls switch 
boxes which make the connections. Tomorrow’s A/V 
communication will be digital but still shipped by physical 
connections, e.g., dark fiber. This makes the soft 
switching of the connections an easier problem, but each 
type of communication must still be handled uniquely (Vin 
et al., 1991). This is inherent in the nature of the 
information being shipped. Audio for multi-person 
contacts needs to be mixed separately to filter out noise 
from each site. Video requires compression to ship. 
Effective compression strategies are hybrids of techniques 
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Figure 1 Typical configuration of current desktop video conferencing configurations. The video and audio are shipped by 
analog and a local area network passes the control informaton to a central server which is manageing the 
switching.. 


based on changes in the visual scene over time and 
information present in the visual scene (Fox, 1991). 
Thus,scenes from the different sites in the multi-person 
conversation will be compressed and managed differently. 
Exchanges of computer messages handling the shared 
workspaces will form the third channel. In addition to 
managing shared computer workspaces, the computer 
signals on this channel will also be used to manage the 
transmission of the audio and video channels, e.g, 
authorizing an additional person to enter into a video 
meeting. 

Current desktop video conferencing installations ship the 
audio and video over analog channels and use a client- 
server architecture to manage the switching (Buxton & 
Moran, 1990; Arango et al, 1992). The client computer is 
located close to the switching centre and maintains a 
database of connection states to manage the A/V 
switching. Digitized video and audio transmission need not 
be controlled by a centralized switching arrangement 
opening up other possible control architectures. 


In the discussions of the user interface to desktop video 
conferencing which follow, digitization of all signals is 
assumed. This opens the possibility of comparing client- 
server versus distributed A/V controls from the standpoint 
of user needs for this environment. This also opens up the 
issue of packaging the different varieties of digitized 
information that are to shipped plus a much larger issue, 
that of shipping the very large amounts of data arising 
from digital video. Current network solutions have not 
been designed for this type of traffic. The list of user issues 
associated with video conferencing is endless. Below, I 
discuss six important ones to illustrate the major impact of 
network architecture on the desktop video conferencing 
interface. 

Modality Synchronicity 

Humans communicate by many modes. They talk, they 
gesture, they point, they look at things with their eyes. 
What they are relatively unaware of is how synchronized 
these activities are. Gestures end at the end of a clause, 
eyes change direction at sentence end. Not having this 
synchronicity is symptomatic of underlying brain 
disorders. The problem is a variation of motor aphasia, and 
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it is very disturbing for others to engage in communication 
with a motor aphasic person. Despite a feeling of deep 
uneasiness, the recipient of the asynchronous 
communication cannot describe what is wrong with the 
conversation. An example that most of us are familiar with 
is the TV character. Max Headroom. 

When voice and video signals are captured and shipped 
separately, it is likely that they will arrive slightly out of 
synch. Such loss of synchronization is worse than a badly 
dubbed movie because it destroys all the redundant cues the 
listener relies on for understanding the conversation. If, in 
addition, the individual is heard to be typing or seen to be 
drawing in the video picture, and the screen update 
information arrives late or early, the flow of discussion 
proceeds haltingly as each person waits for corroboration 
that all information has arrived. Early video conferencing 
systems often used a speaker phone to send the audio 
portion of the signal, but quickly found this setup to be 
inadequate. Human speech is highly overlapping (Buxton 
& Sellen, 1992), but speaker phone technology does not 
permit this overlapping. Without the visual channel, 
attempts to speak are not apparent. With the visual 
channel, the limitations of the speaker phone are all too 
visible. 

Desktop video conferencing systems can be designed to 
pack and ship all signals as one packet, but this leads to 
other problems. Audio is very different from video and 
demands different handling characteristics. Although both 
are continuous signals, the sampling rates can be very 
different. Screen events will also have different closure 
times than either video or audio signals. A series of mouse 
selections taking 500 milliseconds or more may occur 
before it is possible to send a valid user event. Whether the 
audio, video and user event information is sent digitally 
down a single channel to be unpacked by the receiving 
workstation or down separate channels, synchronous 
display of all channels of information is essential for the 
success of the system. 

Entering and Leaving Sessions 

Unlike telephone contact, visual contact is rarely restricted 
to two people in an active workplace. Sighting other 
individuals working together in an office is often an 
invitation for a third and possibly fourth person to join the 
conversation (Root, 1988). Video sessions need to support 
similar behaviour. This implies a video conferencing 
capability in which participants can scan all video 
conversations and ask to join those which interest them. It 
also implies an environment in which people can leave a 
video conversation and new people can join maintaining a 
continuous thread of connected people even after both 
originators of the conversation have left. 

A variety of mechanisms exist for handling multiple 
individuals in a video meeting. Most arrangements mix the 
video signal at a centralized source and ship the merged 
version down a single video channel to each recipient. 
Compressing the video signal makes mixing more 
difficult. Nevertheless, this can be done at a 
communications bridge which unpacks each signal, mixes 
it and then compresses it again for shipping. An inherent 


disadvantage is incurred in having to unpack the video 
signal twice. Mixing the video at a centralized site makes 
it easier to build interfaces that allow others to peer into 
meetings that are going on but centralized mixing is an 
architecture that does not scale up very well. Centralized 
video mixing also means that a software bridge is needed 
for each meeting that is taking place, even for two person 
meetings because of the potential for adding more 
participants to these meetings. 

Video signals can alternatively be sent directly to each 
workstation and mixed there, but this implies that each 
workstation have the ability to handle that level of input 
traffic. Since digitized video signals already require high 
bandwidth to ship, this is currently not a feasible 
arrangement 



Figure 2 A user is seated in front of three Hydra units. In 
the photo, the Hydra units sit on the table in front 
of the chairs that would otherwise be occupied by 
three remote participants. Each Hydra unit 
contains a video monitor, camera, and 
loudspeaker. A single microphone conveys audio 
to the remote participants. (From Buxton & 
Sellen, 1992) 


Hydra is an example of video meetings that are not mixed at 
a central site (Buxton, 1991). It is one of the more 
innovative interfaces that solves the problem of audio and 
positional location in video conferences. Hydra provides a 
video image, camera, speaker and microphone in a small 
desktop unit for each person engaged in the meeting. If four 
people are meeting, a Hydra user would have three units on 
the desktop, each one representing a virtual person in the 
meeting. Figure 2 contains a diagram of one of the Hydra 
units. Hydra currently works by shipping video signals 
down separate channels. This type of transmission uses an 
enormous bandwidth. Hydra also has no mechanism for 
displaying a visual picture of the meeting taking place to 
others who might want to join the meeting. Although 
Hydra solves important social considerations for meetings, 
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its implementation in a large scale communication system 
would be very difficult 

Maintaining Continuous Sessions 

Our research at the University of Toronto has shown that 
one of the very viable modes of a desktop video 
conferencing environment is an open channel (Li & 
Mantei, 1992). A visual channel is maintained 
continuously between one or more offices to support ad hoc 
information exchange by people working closely together. 
These connections are maintainable with the architecture 
shown in Figure 1 but do not handle a private video call. A 
private call is different from a meeting. The caller wants to 
talk to one person only. Such a setup is equivalent to 
having a two-party line, but the line needs to support the 
visual contact for both parties yet not connect ail other 
parties into a video conference. The setup also needs the 
capability of connecting the parties into a video 
conference, if at some point, the caller wants to join the 
larger group. Thus, the underlying connection architecture 
needs to be able to switch the caller to the software bridge 
which is maintaining the other connections 

Privacy Considerations 

Up until now, we have only discussed the problems of 
making different multiple connections and have not 
addressed the issue of preventing users of the system from 
making visual contact. It is believed that being able to 

view co-workers at work provides useful opportunities for 
communication which co-workers take advantage of (Kraut 
et al., 1988). Limiting the video channel to a meeting or a 
video call prevents these visual opportunities yet keeping 
an open visual channel infringes on individual privacy. 
What is needed is a system that is both designed to permit 
large amounts of visual browsing but that also allows 
participants to limit browsing intrusions by giving 
browsing rights to a subset of people. The underlying 
design issue is where to keep the privacy informaiton, 
locally or in a centralized database. 

Privacy settings can be kept at each desktop node, but if 
people are browsing meetings, then the privacy 
considerations of each person in the meeting need to be 
handled. Such privacy screening can cause long delays for 
larger meetings where nodes in many different cities require 
checking. A faster way is to manage meetings at a 
centralized location with privacy being the lowest common 
denominator of the assembled group. 

Soft Communication Failures 

A common problem with long distance communication 
technology is one of providing alternatives to 
communication failures. The basic failure is one of not 
reaching the individual called. Voice mail and the 
telephone answering machine are solutions to these 
problems. When direct contact fails, there is a backup 
storage device for leaving an audio message. This works 
well for voice communications, but video contact because 
of its much larger bandwidth and cost of storage may not be 
amenable to the same solutions. Currently, voice mail 


storage is centralized. Callers record their message and 
have options of replaying it and re-recording it if it sounds 
unsatisfactory. Users can then call the central storage and 
access their calls. This setup works reasonably well for 
voice messaging. How would it work for video messaging? 

A caller would send a video message to a centralized site. 
This would be digital video involving a very large amount 
of storage. If the caller wanted to play back the message, 
the digital video would be sent back to the caller. This is a 
large volume of network traffic which can be avoided by 
local playback and shipping when the message is 
completed. However, local playback requires local storage 
which is currently expensive for digital video. This 
problem can be cheaply bypassed by storing analog video 
which can be digitized when the message is finally ready to 
transmit. Local video storage is also prone to failure which 
implies additional backup strategies that use (1) other local 
storage devices or (2) a centralized device when all else 
fails. The underlying argument is that mechanisms for 
supporting transitions to video mail cannot be handled by 
using the current storage solutions for telephony. 

Support for Multiple Contacts 

Video wears two hats in the office. It is a broadcast medium 
and a meeting tool. This implies support for both functions 
in the desktop video conferencing environment. It also 
implies the capability to move between either function 
with little difficulty. Broadcast video involves a multipoint 
setup that distributes a single video image to many sites 
but does not necessarily receive video images from these 
sites. Broadcast connections are often sent to interrupt 
other connections such as the audio landing notifications 
on airplanes which interrupt music channels, but 
broadcasts can often be secondary information that users 
would like to see interruptible by video phone calls. A 
network which manages broadcasts is configured 
differently than a network which handles point to point 
contacts. In a desktop video conferencing environment, 
both activities need support. 

Activity Sharing Software 

Activity sharing applications are programs in which 
multiple individuals working on separate workstations can 
simultaneously be viewing and updating the same work. 
These workstations can be a few metres or hundreds of 
kilometres apart. Messages sent along networks joining 
the workstations update the screens with each users input. 
A variety of such programs exist for different workstation 
platforms and some are commercially available such as 
Aspects tm (Group Technologies, 1991). Shared activity 
applications primarily support shared writing and shared 
drawing tasks (Fish, et al., 1988; Tang & Minneman, 
1990, Greenberg & Bohnet, 1990), but prototypes also 
exist for supporting programming (Brothers, 
Sembugamoorthy & Muller, 1990) and database design 
work (Hayne & Ram, 1990). Most software has been 
written as application specific, e.g., for wordprocessing 
only, but some work has been done to build an underlying 
operating system or window system that will support a 
wide variety of single user applications (Lauwers & Lantz, 
1990; Lauwers et al., 1990). 
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Two mam types of architecture support the shared activity 
sessions. The first is a client-server architecture in which 
the application resides on a designated workstation and all 
other workstations send messages to update the data 
structure residing in the application (University of 
Michigan, 1990). The application, in turn, sends out 
messages to update the screens on each workstation. The 
second is a replicated architecture. (Replicated architectures 
are also called serverless or peer-to-peer architectures.) In 
the replicated architecture, an application resides on each 
workstation and exchanges messages with all other 
workstations, accepting input generated by other users and 
sending input to update the screens of the other users 
(Crowley et al., 1990). Replicated architectures have 
serious fragility problems (Ahuja, Ensor & Lucco, 1990) 
because of difficulties with synchronization and 
heterogeneity in workstations whereas client-server 
architecture generate a large amount of message traffic and 
have to maintain states of all workstations. Hybrids of 
partially replicated and part client-server architectures also 
exist (Mawby, 1991; Lu, 1992) which bring the advantages 
and disadvantages of both solutions. 

From the human perspective, either solution has problems 
in supporting collaboration. Instead of either extreme, the 
hybrid between the client-server and replicated architecture 
needs to be tuned very carefully to meet the user constraints 
listed in the sections that follow. The shared architectures 
need to be concerned with maintaining adequate system 
response time, adding latecomers to a work session, not 
disrupting the flow of work with lockout procedures, 
synchronizing the work between collaborators, etc. 

Response Time 

A user who is developing drawings using a mouse or a 
stylus interface needs to have the immediate feedback on 
the results of their motor behaviour displayed on the screen 
if they are to use their finely tuned hand-eye coordination 
capabilities. Lags in displaying figures on the screen are 
unacceptable and make the drawing task extraordinarily 
difficult. Current client-server arrangements which send the 
user input to the server and then back to the client are far 
too slow for this microsecond coordination. 

Users also need to see the drawing behaviour of their 
collaborator in real time. If their collaborator is drawing a 
box or circle or selecting text to delete, the size and 
placement of of the figure or the span of text selected show 
up on the screen as feedback to the user before a key is 
released or other action taken to indicate acceptance of the 
sizing, placement or text selection outlined on the screen. 
If these events are not shown in real time to the 
collaborator it is difficult for the collaborator to comment 
on the work being done (Lu & Mantei, 1992). A client- 
server architecture could synchronize these times better 
than replicated architectures especially if distances between 
sites were large, but both architectures will have problems 
with synchronizing this continuous event traffic. This is a 
real time concern. 

Hand-eye coordination is important to support in shared 
wordprocessors as well as shared drawing tools. Text that 
is delineated by a mouse controlled cursor is a variation of 


drawing behaviour. Gesturing and pointing on text require 
similar hand-eye coordination and thus, real time support. 

Adding Latecomers 

Meetings rarely begin or end with all people present. 
Attendees join sessions late and others leave early. This 
requires updating the newcomer to what has transpired in 
the meeting and making sure that the early leaver has a 
final document of all the decisions made at a meeting. 

In a shared activity session, the problem of updating the 
workstation of a latecomer is a different task depending on 
whether a client-server or a replicated architecture is used 
for managing the shared session (Chung, 1991). With a 
client-server architecture, the problem is one of setting the 
window parameters of the newcomer and updating the 
shared session information of the other participants. With 
a replicated architecture, the problem is one of transferring 
a copy of the work to the newcomer's workstation. On the 
social behaviour side of the issue, the problem is one of 
integrating the newcomer into the meeting with little 
disruption. The newcomer needs to obtain a copy of the 
current state without interrupting the flow of changes and 
additions that are being made to the work product. 
Currently, most replicated architectures lock out all work 
until the newcomer's workstation is updated. Others 
simply do not permit latecomers. Client-server 
architectures such as Rendezvous (Patterson et al., 1990) 
can support a more graceful entry but run into problems in 
providing an accurate replication of other user's 
workstation environments. This problem becomes 
particularly acute in heterogenous workstation 
environments. Although the client-server architecture is 
much better at handling the latecomer problem, the effects 
of a client-server architecture on response time latency 
strongly support a replicated solution. 

Lockout and Concurrency Issues 

Up until recently, users have not had the capability to work 
in the same space at the same time. Shared writing, drawing 
and spreadsheet software give us this functionality but not 
without integrity concerns. If one user deletes a sentence at 
the same time another user is modifying it, what is the end 
result? One solution is to prevent modification access to 
any object that is already in use by another user (Ellis & 
Gibbs, 1988), but this creates new problems. To be locked 
out, text or figures need to inform all workstations of their 
modification status. Both the informing process and the 
interrogation process take time. Users neither like waiting 
for access to a screen object (e.g., a word) before they can 
use it or being denied access seconds after they have 
attempted to modify the screen object. The denial of access 
is a particularly acute problem because users have already 
begun their cognitive work plan when they attempt to 
select the object (Card, Moran & Newell, 1983). Users 
want to know in advance when text or figures are locked out 
to avoid unproductive mental effort. 

In most writing tasks, concurrency is not an issue. Lockout 
is usually at the character level as is seen in such shared 
editor systems as Aspects (Group Technologies, 1991), 
ShrEdit (University of Michigan, 1990), and 
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SASE.(Mawby, 1991). With such a fine resolution for 
lockout, the probability of concurrent access is low. 
Writing is such a complex cognitive process, that people 
composing text prefer to do it on their own (Posner & 
Baecker, 1991). However, one of the major times when 
users are likely to collide occurs frequently. When one 
participant in a shared session is typing an idea and others 
are observing its generation, typing or spelling errors 
occur which other users jump in to fix. They cannot 
perform these correction because this text is locked during 
creation. 

Ellis, Gibbs and Rein (1988) give a good presentation of 
alternate solutions to lockout which support a more 
flexible participant environment. One of their suggestions 
is that of allowing individual work to diverge and putting 
the onus of the repair on the participants. Another 
possibility, especially for drawing tasks, is to ignore 
concurrency and to throw away messages that request 
changes to objects that no longer exist. This is done in 
CaveDraw (Lu, 1992) where one user can delete the very 
layer another user is drawing on. Collision events are 
assumed to be relatively infrequent and negotiable by the 
other multi-media support available. For other shared 
software tasks, similar optional concurrency control 
measures can be in place. This implies that the software 
which controls concurrency issues will need to be smarter 
and know when and where concurrency will be problematic. 

View Synchronicity 

Users of collaborative tools often do not use them in a 
collaborative fashion. There are collaborative tasks that 
people perform that are so complex, e.g., writing or 
programming, that interacting with others inhibits the 
performance of the task (Neuwirth, 1990). Often 
collaborative work sessions are a combination of people 
working together, parceling out the work to be done and 
then working individually on the work. The advantage of a 
collaborative work environment is that it allows co- 
workers to smoothly move into and out of collaborative 
mode. This type of observed movement between 
collaborative and non-collaborative states is common in 
computer supported meeting rooms (Mantei, 1988). 

The underlying application support for allowing both 
types of work patterns in a collaborating group will permit 
participants to have different views of the work product. It 
also will have a functionality for synchronizing 
participant’s views. Synchronized views work in tandem. 
When one participant scrolls to a new place in the text, the 
other participants see their windows scroll as well. To 
synchronize views, the software needs to maintain 
information on the screen states of each of the 
participating workstations. Since individual users may 
have differing window sizes, use different fonts and even 
have different objects occluded, synchronization requires a 
large amount of mapping operations between each 
workstation. 

Since a sequence of user events at one workstation is often 
necessary to determine the synchronization at the other 
workstation, arbitrarily breaking up the events into 
message packets can destroy this context. For example, a 


user can scroll to a position and then se\ec\ Nive visible 
text. The workstation that is synchronized with the first 
workstation can scroll lo an approximate location, but if 
text is selected in a part of the window on the first 
workstation that is occluded on the second workstation, the 
second workstation will have to execute a second scroll and 
then show the text selection. This type of synchronization 
looks jumpy and awkward to the participants, not to 
mention the concern that both participants are still not 
seeing the same text. 

Occlusion in many window systems may be difficult to 
detect because one of the fundamental premises of window 
systems is that applications should be able to write into 
windows without concern for what portions of the window 
are occluded (Scheifler, Gettys & Newman, 1988). Without 
control of occlusion, view synchronization becomes a 
difficult problem. 

Synchronization is also one case where concurrency 
control is necessary. Once screens are synchronized, scroll 
bar usage can lead to "scroll wars" as can other global 
events which update the screen (Stefik et al., 1987). So, 
although synchronization is very much at the interface 
level, mechanisms that pass the streams of user behaviour 
between the synchronized workstations also have to 
examine these event streams and either require 
homogeneity in the resources used by the participants or 
interpret the streams to best represent a synchronized 
environment. 

Gesture Support 

When people work together, they use their hands to point, 
circle,or motion in a wide variety of ways about the 
information that is being created (Tang, 1989). Support for 
gesturing in shared activity software comes in the form of a 
telepointer, that is, a cursor which is seen on everyone's 
screen that is controlled by the owner of the telepointer. 
Each participant in a shared work session has a personal 
telepointer, usually uniquely identified by shape or colour. 
Each participant also has their own cursor for moving 
around their work screen. 

Workstations therefore need to support multiple cursors, 
i.e., the cursor of the person at the workstation and the 
cursors of any of the participants that are in telepointer 
mode. Since cursor functioning is done at low levels in 
systems software and since most workstations are rarely 
equipped to handle more than one mouse input, this support 
suggests fundamental changes in workstation and/or 
operating system design. Workstations will need to 
support multiple cursors, and if necessary, to change the 
shape and form of the cursors as they move over the 
windows (Greenberg, 1991). 

Version Control 

In a replicated architecture, which workstation has the most 
recent version? Is it the one that left the session last, the 
one with the latest time stamp, or the one that has been 
designated to hold the latest version? If an individual 
works alone on the work product after the session, is the 
update automatically transferred to each of the other 
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workstations? What happens if two people work on the 
file separately but at the same time? Storage mechanisms 
need to be put in place that allow multiple users to maintain 
version control of the work product. This is not a hard 
technological problem. What is necessary is that the 
solution be incorporated in the basic architecture of the 
system rather than at the application level. Otherwise the 
latest version might reside on a workstation that is not in 
operation or connected to the next joint work session. 
Humans are notoriously bad at version control. It is best to 
leave this task to the system. 

Conclusion 

In this paper I have discussed a set of requirements that need 
to be met if the social nuances of collaboration are to be 
supported by cooperative work tools. What is important 
about these requirements is that the ability to meet them 
lies in solutions at the very heart of the underlying system 
architectures and communication structures that support 
CSCW. Small adjustments at the interface level will not fix 
the problems. In some cases, only redesign of the 
computer workstation and/or its operating system will 
make the problem solutions doable Some of the most 
viable fixes for one requirement contradict the most viable 
solution for another. Some of the solutions violate current 
ways of thinking about the world, i.e., we don’t need to 
guard against concurrency. I haven't presented solutions to 
these problems. I am not the systems guru. However, 
demonstrating a problem’s existence is always the first 
step in its solution. 
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Abstract 

The prediction of movement time in human-computer 
interfaces as undertaken using Fitts’ law is reviewed. 
Techniques for model building are summarized and 
three refinements to improve the theoretical and 
empirical accuracy of the law are presented. 
Refinements include (1) the Shannon formulation for 
the index of task difficulty, (2) new interpretations of 
“target width” for two- and three-dimensional tasks, and 
(3) a technique for normalizing error rates across 
experimental factors. Finally, a detailed application 
example is developed showing the potential of Fitts’ 
law to predict and compare the performance of user 
interfaces before designs are finalized. 

Keywords: human performance modeling, Fitts' law, 
input devices, input tasks 

Introduction 

Movement is ubiquitous in human-computer 
interaction. Our arms, wrists, and fingers busy 
themselves on the keyboard and desktop; our head, 
neck, and eyes move about attending to graphic details 
recording our progress. Matching the movement limits 
and capabilities of humans with interaction techniques 
on computing systems, therefore, can benefit from 
research in this important dimension of human 
behaviour. 

One focus in HCI research is in predicting and 
modelling the time for humans to execute tasks. 
Although encompassing a vast territory to be sure, one 
dimension is the time invested in movement. In fact, 
movement is an integral, seemingly innocuous 
component of many research questions in HCI: Are 
popup menus superior to menu bars? Which input 
device affords the quickest and most accurate 
interaction? Should a scroll bar in a text editor be on 


the left or right side of the CRT display? Can gestures 
replace commands in text editing? 

In this review paper, we will explore Fitts’ law, a 
powerful model for the prediction of movement time in 
human-computer interaction. We are motivated by (a) 
apparent problems in previous work, (b) the difficulty 
in interpreting and comparing published results, and (c) 
the need to guide future research using Fitts’ law. 

We begin with a brief tour of Fitts* law, and follow by 
describing some refinements to correct flaws or to 
improve its prediction power. Finally, derived models 
are used in an application example to illustrate the 
potential of Fitts’ law in assessing and comparing 
interface scenarios before they are finalized in products. 

A Brief Tour of Fitts’ Law 

The application of information theory to human 
performance modeling dates to the 1950s when 
experimental psychologists (e.g.. Miller, 1953) 
embraced the work of Shannon, Wiener, and other 
information theorists as a framework for understanding 
human perceptual, cognitive, and motor processes. 
Models, or “laws”, that persist today include the Hick- 
Hyman law for choice reaction time (Hick, 1952; 
Hyman, 1953) and Fitts’ law for movement time (Fitts, 
1954; Fitts & Peterson, 1964). 

According to Fitts, a movement tasks’ difficulty (/D, 
the “index of difficulty”) can be quantified using 
information theory by the metric “bits”. Specifically, 

ID =log 2 (2<4 / W) (1) 

where A is the distance or amplitude to move and W is 
the width or tolerance of the region within which the 
move terminates. Because A and W are both measures 
of distance, the term within the parentheses in Equation 
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1 is without units. The unit “bits” emerges from the 
somewhat arbitrary choice of base 2 for the logarithm. 
From Equation 1, the time to complete a movement task 
is predicted using a simple linear equation, where 
movement time (MT) is a linear function of ID. 

Figure 1 shows the serial tapping task used by Fitts 
(1954). In this experiment, subjects alternately tapped 
as quickly and accurately as possible between two 
targets of width W at a distanced. Obviously, as 
targets get farther away or as they get smaller, the tasks 
get more difficult and more time is required to complete 
the task. In Fitts’ experiment A and W each varied over 
four levels. The easiest task had A - 1 inch and W= 1 
inch for ID- log2(24/W) = log2(2) = 1 bit. The hardest 
task had A = 16 inches and W = 0.25 inches for ID = 
log2(128) = 7 bits. 



Figure 1. The serial tapping task used by Fitts 
(1954). 

The task in Figure 1 can be implemented on interactive 
graphics systems using targets displayed on a CRT and 
a cursor manipulated by an input device. A common 
variation is the discrete task — a single movement 
toward a target from a home position (see Figure 2). 
Target selection is usually accomplished by a button 
push when the cursor is over the target. 

The first use of Fitts’ law in HCI research was the work 
of Card, English, and Burr (1978) who applied the 
model on data collected in a text selection task using a 
joystick and a mouse. Subjects were required to move 
the cursor from a home position to a target — a word — 
and select it by pushing a button. Numerous other HCI 
researchers have subsequently used Fitts’ law. 
Examples include Boritz, Booth, and Cowan (1991); 
Giilan, Holden, Adam, Rudisill, and Magee (1990); 
Card, Mackinlay, and Robertson (1990); Epps (1986); 


MacKenzie, Sellen, and Buxton (1991); Walker and 
Smelcer (1990); and Ware and Mikaelian (1987). 


h- w H 


A 




cursor 



target 


Figure 2• A discrete task using a cursor and a 
target displayed on a CRT. 


A movement model based on Fitts’ law is an equation 
predicting movement time (MT) from a task’s index of 
difficulty (ID). Figure 3 shows the general idea. 



Figure 3. Movement time prediction. 


As expected, movement time for hard tasks is longer 
than for easy tasks. The prediction equation for the line 
in Figure 3 is of the form 

MT-blD (2) 

or 

MT =a + bID (3) 
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depending on whether or not the line goes through the 
origin. In both cases, b is the slope of the line. 

Since task difficulty is analogous to information, the 
rate of task execution can be interpreted as the human 
rate of information processing. For example, if a task 
rated at, say, ID = 4 bits is executed in 2 seconds, then 
the human rate of information processing is 4/2 = 2 
bits/s. This is also evident by examining Figure 3. 
Since the vertical and horizontal axes carry the units 
“seconds” and “bits” respectively, the slope of the line 
is in “seconds/bit”. The reciprocal of the slope is in 
“bits/s”. The latter measure Fitts called the index of 
performance (IP). Since IP is in bits/s, the term 
“bandwidth” is also used 

Intuitively, the higher the bandwidth the higher the rate 
of human performance since more information is being 
articulated per unit time. One of the strengths in Fitts* 
law is that measures for IP , or bandwidth, can motivate 
performance comparisons across factors such as device, 
limb, or task. It follows that performance in a human- 
computer interface can be optimized by selecting and 
combining those conditions yielding high bandwidths. 

Equation 2 is ideal since the prediction line goes 
through the origin. This is important for the intuitive 
reason that a movement task rated at ID = 0 bits is 
predicted by Equation 2 to take zero seconds, as 
desired. By Equation 3, however, a non-zero intercept 
implies that a task rated at ID = 0 bits will take “a” 
seconds to execute. (This point will surface again 
later.) 

Building a Fitts’ Law Model 

In building a Fitts* law model, the slope and intercept 
coefficients in the prediction equation are determined 
through empirical tests. The tests are undertaken in a 
controlled experiment using a group of subjects and one 
or more input devices and task conditions. On 
interactive computing systems, this could range from 
manipulating a cursor with a mouse and selecting icons 
to manipulating a virtual hand with an input glove and 
grabbing objects in a 3D virtual space. 

The design of experiments for Fitts’ law studies is 
simple. Tasks are devised to cover a range of 
difficulties by varying A and W . For each task 
condition, multiple trials are conducted and the time to 
execute each is recorded and stored electronically for 
statistical analysis. Errors are also recorded (and 
analysed as discussed later). Generally, measurements 
are aggregated across subjects resulting in one data 
point for each task condition. A typical data set is 
shown in Table 1 for a stylus in a serial point-select task 
mimicking Fitts* serial tapping task (MacKenzie, 1991). 


Table I 


Data From an Experiment Using a 
Stylus in a Point-Select Task 




ID 

MT 

Errors 

IP 

A* 

W* 

(bits) 

(ms) 

c%) 

(bits/s) 

8 

8 

1 

254 

0.0 

43 

8 

4 

2 

353 

1.9 

6.1 

16 

8 

2 

344 

0.8 

6.4 

8 

2 

3 

481 

1.7 

6.4 

16 

4 

3 

472 

2.1 

6.6 

32 

8 

3 

501 

0.6 

6.2 

8 

1 

4 

649 

8.8 

6.3 

16 

2 

4 

603 

2.1 

6.8 

32 

4 

4 

605 

2.7 

6.7 

64 

8 

4 

694 

2.5 

5.9 

16 

1 

5 

778 

7.0 

6.6 

32 

2 

5 

763 

3.4 

6.6 

64 

4 

5 

804 

2.3 

63 

32 

1 

6 

921 

8.5 

6.6 

64 

2 

6 

963 

33 

6.3 

64 

1 

7 

1137 

9.9 

63 



Mean 

645 

3.6 

63 



SD 

243 

3.1 

0.6 


Experimental units; 1 unit = 8 pixels 


The first three columns contain the independent 
variables target amplitude (A), target width (W), and the 
associated index of difficulty (ID) calculated using 
Equation 1. A and W each varied over four levels, 
yielding IDs of 1 to 7 bits. In the last three columns are 
the dependent variables movement time (MT), error 
rate, and the index of performance (IP = ID!MI). Each 
row entry is the mean of about 470 trials. The grand 
means were 645 ms for movement time, 3.6% for error 
rate, and 6.3 bits/s for the index of performance, as 
shown in the second last row. 

The next step in model building is to enter the 16 MT- 
ID points in tests of correlation and linear regression. 
The data in Table 1 yield a regression line with 
movement time (ms) predicted as 

MT = 53 + 148 ID (4) 

with a correlation of r = .992. Correlations above .900 
are considered very high for any experiment involving 
measurements on human subjects. A high r suggests 
that the model provides a good description of observed 
behaviour. 

The prediction equation has an intercept of 53 ms and a 
slope of 148 ms/bit. Converting the slope to its 
reciprocal yields an index of performance, or 
bandwidth, of 6.8 bits/s. Often the data points and 
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regression line are shown together in a scatter plot (see 
Figure 4). 



Figure 4. Scatter plot and regression line for 
the data in Table 1. 


As just shown* the index of performance can be 
calculated using a direct division of mean scores (IP = 
ID/MT) or through linear regression (IP = 1 lb, from MT 
= a+bID). The two methods yielded slightly different 
results: 6.3 bits/s using the direct method vs. 6.8 bits/s 
using the slope reciprocal. Although the disparity 
seems small, the correct method of calculation is in 
doubt Fitts used the direct method in his 1954 paper 
and linear regression in a subsequent study (Fitts & 
Peterson, 1964). Most (but not all) current researchers 
use linear regression. Notably, the disparity is 
systematic: If the line in Figure 4 is rotated counter¬ 
clockwise forcing it through the origin, the slope 
increases and the slope reciprocal decreases, thus 
reducing the disparity. 

If the regression line intercept is small the difference 
between the two bandwidth measures will be slight. If 


the intercept is large, the difference may be appreciable. 
In the latter case, a consistent, additive component of 
the task (such as a button push) may be the source of 
the intercept. 

An example is the model built by Card et al. (1978) for 
the mouse in a text selection task: 

MT= 1030 + 96 7D (5) 

Although the fit was again very good (r = .91), this 
equation has one of the largest intercepts in published 
Fitts' law research. The implication, of course, is that a 
movement task rated at ID = 0 bits will take 1.03 
seconds. 

The large intercept also casts doubt on the validity of 
the slope reciprocal as a measure for bandwidth. The 
slope of 96 ms/bit translates into a bandwidth of 10.4 
bits/s. The Card et al. (1978) experiment used 20 task 
conditions with a mean ID of 2.63 bits/s. The mean 
movement time was reported as 1.29 s; so, the 
bandwidth calculated using a direct division of means is 
2.63/1.29 or 2.0 bits/s. This differs from the slope 
reciprocal bandwidth by a factor of five! This is a 
concern if one wishes to generalize findings in terms of 
the human rate of information processing. Is the rate in 
this case 10.4 bits/s or 2.0 bits/s? As evident in Table 2, 
the regression line equations (and resulting bandwidths) 
vary tremendously in previous Fitts' law research using 
the mouse in point-select tasks. 

A third possibility for calculating the prediction model 
is regression-through-the-origin. Although it has never 
been employed, the method will produce the best-fitting 
line passing through the origin. 

Since linear regression produces the prediction line with 
the best fit, it is the preferred choice for model building. 
However, the proviso is added that the intercept must be 
small. “Small" in this context is on the order of a few 
hundred milliseconds — a value which can reasonably 
be attributed to random variation in measurements. 


Table 2 

Prediction Equations and Bandwidths From Fitts’ Law 


Studies Using a Mouse in Point-Select Tasks 


Study 

Prediction equation (ms) 

Bandwidth (bits/s) 

Boritzetal., 1991 

MT= 1320 + 430 ID 

23 

Epps, 1986 

MT= 108 + 392 ID 

2.6 

MacKenzie et al., 1991 

MT = -107 + 223 ID 

4.5 

Han et al., 1990 

MT= 389 + 175 ID 

5.7 

Cardetal., 1978 

MT= 1030 + 96 ID 

10.4 

Gillan et al., 1990 

MF- 795 + 83 ID 

12.0 
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Refinements to Fitts’ Law 

Despite an extremely good fit in empirical tests, Fitts’ 
law is a frequent target for critical reviews (e.g., Meyer, 
Smith, Kornblum, Abrams, & Wright, 1990; 
MacKenzie, in press; Welford, 1968). Numerous 
problems surface under close examination or when 
findings are compared across studies. These have 
motivated corrections or refinements to the model. In 
this section we review these and offer suggestions to 
guide researchers in applying the law. 

Formulation for Index of Difficulty 
Early examinations of the law noted a consistent 
departure of data points above the regression line for 
‘"easy” tasks {ID < 3 bits). A new formulation for ID 
was proposed by Welford (1960) to correct this: 

/D = lo g2 (AW+0.5). (6) 

Many researchers, including Fitts, noted an improved fit 
using Equation 6. The Welford formulation was used 
by Card et al. (1978), whose findings were subsequently 
elaborated in the Psychology of human-computer 
interaction (Card, Moran, & Newell, 1983). Not 
surprisingly, many HCI researchers citing Card and 
colleagues adopt the Welford formulation (e.g., Boritz 
et al., 1991; Gillan et al., 1990). 

It has also been argued that Fitts, in formulating his 
model, deviated unnecessarily from Shannon’s original 
work in information theory (MacKenzie, 1989; 
Shannon & Weaver, 1949), and that a more 
theoretically sound formulation for the index of task 
difficulty is 

/D = log2(A/W+l). (7) 

In terms of MT, the prediction model becomes 

MT= a + b log2(A/W+ 1). (8) 

Equation 8, known as the Shannon formulation, is 
preferred because it 

• provides a slightly better fit with observations, 

• exactly mimics the information theorem 
underlying Fitts* law, and 

• always gives a positive rating for the index of 
task difficulty. 

The last point above is understood by examining the 
three formulations for ID. Using the Fitts or Welford 
formulation (Equations 1 & 6), the index is negative if 
the amplitude is less than half the target width; that is, 
A < W/2. At the very least, a negative rating for task 
difficulty is a nuisance. From Equation 7, as A 


approaches zero (for any W ), ID approaches 0 bits, but 
never becomes negative. Obviously, the latter effect 
has strong intuitive appeal. Although for the one¬ 
dimensional paradigm an amplitude less than WI2 only 
occurs when the starting position is inside the target 
(see Figure 2), very small A:W ratios are fully possible 
when the law is applied in two dimensional tasks. This 
is demonstrated in the next section. 

Extension to Two Dimensions 
It is important to remember that the experiments 
^undertaken by Fitts and most other experimental 
psychologists tested one dimensional movements. This 
is evident in Figures 1 and 2. Since both target 
amplitude and target width are measured along the same 
axis, it follows that the model is inherently one 
dimensional. 

HCI researchers employing Fitts* law invariably use 
target selection tasks realized on a two-dimensional 
CRT display. The shape of the target and the angle of 
approach, therefore, must be considered carefully in 
applying the model. If the targets are circles (or 
perhaps squares), then the one-dimensional constraint 
remains largely intact. (The “width” of a circle is the 
same, regardless of the angle of measurement!) 
However, if targets are rectangles (e.g., “words”), the 
situation is confounded. We can still view the 
amplitude as the distance to the centre of the target; but 
the definition of target width is unclear. This is 
illustrated in Figure 5. 

In applying the model in 2D tasks, a critical question 
arises: What is target width? The default strategy is to 
consistently use the horizontal extent of the target. We 
call this the “STATUS QUO” model for target width. 
Unfortunately, a STATUS QUO model yields 
unrealistically low (sometimes negative!) estimates for 
task difficulty when, for example, a short and wide 
target, such as a word or series of words, is approached 
from above or below at close range. At least two 
examples of this exist in the literature. Gillan et al. 
(1990) used Fitts’ law in a target selection task using 
strings of characters as targets while varying the 
approach angle and approach distance. One extreme 
condition saw a 26-character (6 cm) target approached 
diagonally from a distance of 2 cm. Welford’s 
formulation was used, so the index of task difficulty 
was ID = log2(A/W+ 0.5) = log2(2/6 + 0.5) = -2.6 bits. 
The negative rating is troublesome. (A similar example 
is found in Card et al., 1978). 

One result of task difficulty extending to the left of ID = 
0 bits is that it becomes certain that a positive (probably 
large) intercept emerges under linear regression. This is 
because conditions with ID = 0 bits or less correspond 
to conditions that actually occurred in the experiment. 
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No doubt, such tasks will take a non-trivial (positive) 
amount of time. 



Figure 5. Fitts’ law in two dimensions. The 
roles of width and height reverse as the 
approach angle changes from 0° to 90°. 

We suggest two ways to correct this problem. The first 
is to use the Shannon formulation for ID, which for the 
example above would increase the rating to log 2 ( 2/6 + 
1 ) = +0.42 bits. 

A second and additional strategy is to substitute for W a 
measure more consistent with the 2D nature of the task. 
Consider Figure 6 . The inherent ID constraint in the 
model is maintained by measuring W along the 
approach axis. This is shown as W* (read “W prime”) 
in the figure. Notwithstanding the assertion that 
subjects may “cut comers” to minimize distances, the 
W* model is appealing because it allows a ID 
interpretation of a 2D task. 

Another possible substitution for target width is “the 
smaller of W or H”. This pragmatic approach has 
intuitive appeal in that the smaller of the two 
dimensions seems more indicative of the accuracy 
demands of the task. We call this the “SMALLER-OF” 


model. This model is computationally simple since it 
can be applied only knowing A, W, and H. The W* 
model, on the other hand, requires A, W, H, and and 
a geometric calculation to determine the correct 
substitution for W. The SMALLER-OF model is limited 
to rectangular targets, however. 

An experiment was conducted to test the different 
models for target width on a standard target selection 
task using a mouse. The design employed a balanced 
range of short-and-wide and tall-and-narrow targets 
approached from various angles. The results indicate 
that both the SMALLER-OF and W* models are em¬ 
pirically superior to the STATUS QUO model and that 
the difference between the SMALLER-OF and W 9 mo¬ 
dels is insignificant (MacKenzie & Buxton, in press). 

k-w-a 



Figure 6. What is target width? Possibilities 
include W* (the width of the target along an 
approach vector) or the smaller of W or H. 

In summary, the SMALLER-OF or W 9 model is 
recommended in applying Fitts’ law to two dimensional 
tasks. We should note that extensions to three 
dimensional tasks easily follow from the arguments 
above; although Fitts’ law has yet to be tested in 3- 
space. Finally, note that the W* model reduces to the 
STATUS QUO model for one dimensional tasks. 

Normalization and the Speed-Accuracy Tradeoff 
The reciprocity between the speed of actions and the 
subsequent accuracy of responses has been well 
documented in experimental psychology and human 
factors engineering (e.g., Hancock & Newell, 1985; 
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Pew, 1969). Despite this, researchers all too often base 
performance analyses solely or largely on task 
completion time measurements while paying little 
regard to the errors that accompanied — and equally 
contributed to — performance. Comparative 
evaluations based on movement time criteria are 
difficult when faced with disparities in error rates. That 
is, if condition A was faster than condition B, but had 
more errors, it is uncertain which condition is better. 

The most important refinement to Fitts’ law, perhaps, is 
the technique for accommodating spatial variability or 
errors in responses. The techniques calls for target 
width to be adjusted based on the distribution of “hits” 
(selection coordinates) about each target. Thus, at the 
model building stage, W is a dependent variable rather 
than an independent variable. The claim is that the 
technique increases the power of Fitts’ law since 
normalized models inherit a known and consistent error 
rate. In particular, comparisons within and between 
studies are strengthened by a “level playing field”. 

The output or “effective” target width (W e ) is derived 
from the observed distribution of “hits”, as described by 
Crossman and Welford (Welford, 1968, p. 147). This 
adjustment lies at the very heart of the information- 
theoretic metaphor — that movement amplitudes are 
analogous to “signals” and that end-point variability 
(viz., target width) is analogous to “noise”. 

The technique is illustrated in Figure 7. When a 
nominal error rate of 4% occurs (Figure 7a), no 
adjustment is required (W e = W). When a different 
error rate occurs, target width is adjusted by multiplying 
it by a ratio of z scores obtained from statistical tables 
for the unit-normal curve. For example, if 2% errors 
were recorded on a block of trials when selecting a 5 
cm wide target, then W e = 2.066/2.326 x 5 = 4.45 cm 
(Figure 7b). 1 The analyses proceed as before except 
using “effective 7Z)s”, calculated using W t instead of W. 

Applying this technique is essential if comparisons 
within or across studies are attempted. Rephrasing and 
earlier point, if performance on condition A was 6 bits/s 
and performance on condition B was 5 bits/s, we’d like 
to conclude that condition A was superior; however, in 
the absence of identical or normalized error rates, such 
a conclusion is weak and perhaps wrong. Fitts’ law 
models have appeared in published research 
accompanied by error rates from 0% to 25% (see 
MacKenzie, in press); yet the technique is rarely 
applied. It is strongly recommended that future 
research adopt the method. 


^The technique is described elsewhere in full detail 
with examples (MacKenzie, in press). 


h— w —N 

i * 



(a) 




Figure 7. The method for normalizing 
responses. In (a) the error rate is 4% so no 
adjustment is needed. In (b) the rate is 2% so 
an adjustment is made. 


Applying Fitts’ Law: An Example 

One challenge in research is relating the findings to 
real-world problems confronting practitioners in the 
field. Despite its recognition as one of the more robust 
models for human movement, Fitts’ law has rarely 
migrated from the research lab. 

An instance of Fitts’ law actually being used in a 
product is a computer-aided design tool called Jack , 
developed at the University of Pennsylvania (Badler, 
Webber, & Kalita, 1991). In Jack three dimensional 
human figures (simulated on a CRT display) are 
programmed to execute motions such a picking up 
objects or making adjustments on a simulated control 
panel. Since the researchers did not know how fast to 
program movements in Jack y they called upon Fitts’ 
law. A derived Fitts’ law model was embedded in Jack 
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to provide the duration for movements, based on the 
distance to move and the size of the terminating region 
for the move. The visual result is quite natural. 

In this section, further applications for Fitts’ law models 
are explored and a specific example is developed. 

First, we must recognize when not to use Fitts’ law. 
The law is a prediction model for rapid, aimed 
movement. A variety of user input activities do not fit 
this description, including drawing, inking, writing 
cursive script, and other temporarily constrained tasks. 
Furthermore, some devices are inadequately modeled 
by Fitts’ law. Isometric joysticks are force sensing and 
undergo negligible motion. As a model for human 
movement, it seems odd to apply Fitts’ law where no 
limb movement takes place. Card et al. (1978) found 
that performance data for an isometric joystick were 
poorly described by Fitts’ law. After the data were 
decomposed by amplitude, however, the model fit well, 
yielding a series of parallel lines across amplitude 
conditions. Other devices, such as those for velocity 
control, may also display characteristics inappropriate 
for a Fitts’ law model. 

For the more common pointing devices (such as the 
mouse, trackball, or stylus), and for common point- 
select or drag-select tasks, however, Fitts’ law has the 
potential to assist in the design and evaluation of 
graphical user interfaces. Questions of the form “How 
long will this task take?” can be answered using Fitts’ 
law prediction equations if certain conditions exist. If 
the tasks are rapidly executed with negligible or known 
mental preparation time, system response time, device 
homing time, etc., then there is a good chance Fitts’ law 
can assist in evaluating alternative methods. 

Consider the case of deleting a file (icon) on the Apple 
Macintosh computer. Three possible methods are listed 
below. 

DRAG-SELECT: Drag the icon to the trash- 
can. (This is the traditional Macintosh 
method.) 

POINT-SELECT: Select the file icon with a 
point-select operation, then select the trash- 
can icon with a second point-select operation. 

STROKE-THROUGH: Stroke through the 
icon. This method uses a button-down action 
beside the file icon followed by dragging 
(stroking) through the icon and a button-up 
action on the opposite side. 

The STROKE-THROUGH method is an example of input 
which mimics a natural gesture in manuscript editing 


(Hardock, 1991; Kurtenbach & Buxton, 1991). A 
reasonable assumption for the stroking gesture is that 
the button-down action occurs on the left of the icon in 
the centre of a region the same size as the icon, and that 
the button-up action occurs similarly in a region on the 
right of the icon. 

A possible screen layout is shown in Figure 8. The 
trashcan icon is located at the bottom right of the 
screen. The file icon is placed in the middle of the 
screen at a distance of 14 cm from the trashcan. Both 
icons are 2 cm square. The button-down and button-up 
regions for the stroking gesture are shown in dotted 
lines. 

The calculations proceed using derived pointing and 
dragging models for the mouse (MacKenzie et al., 
1991): 

Pointing model: 

MT= 230 + 166 ID (9) 

(7F= 6.0 bits/s) 

Dragging model: 

MT = 135 + 249 ID (10) 

(IP = 4.0 bits/s) 

The pointing model applies to the POINT-SELECT 
method, while the dragging model applies to the DRAG- 
SELECT and STROKE-THROUGH methods. Note also 
that for the STROKE-THROUGH method the amplitude 
is 4 cm, rather than 14 cm. 

Once the initial move to the icon is complete, the time 
to delete the file icon using each method is calculated as 
follows: 

DRAG-SELECT: 

MT = 135 + 2491og 2 (14/2 + 1) 

= 135 + 249 x3 

= 882 ms (11) 

POINT-SELECT: 

MT =230+ 166 log 2 (14/2 + 1) 

= 230+ 166x3 

= 728 ms (12) 

STROKE-THROUGH: 

MT = 135 + 249 log 2 (4/2 + 1) 

= 135 + 249x 1.58 

= 528 ms (13) 


Graphics Interface ’92 



148 


This result suggests, using Fitts’ law analyses, that the 
traditional method of deleting a file on the Macintosh is 
slower than two alternate methods. (The STROKE- 
THROUGH method is 40% faster.) The example is 
simplistic, however. Other issues such as methods for 
deleting multiple files or for un-deleting files must be 
considered too. 

Even though the rate of information processing is lower 
during dragging than during pointing, the STROKE- 
THROUGH method, which is a dragging operation, is 
faster than the POINT-SELECT method. This is due to 
the combined effect of the intercepts in the prediction 
equations and the different movement amplitudes. 
Using the stroke-through method, the predicted MT is 
independent of the distance ( A ) between the file icon 
and the trashcan icon since the movement amplitude is 
nominally set at twice the file icon’s width. However, 
the predicted MT decreases with A for the POINT- 
SELECT method. This suggests there may be a cross¬ 
over point below which the POINT-SELECT method is 
faster. Knowledge of this may play a critical role in 


selecting an appropriate method. In the current 
example, the cross-over point is calculated by equating 
the POINT-SELECT and STROKE-THROUGH 
predictions as follows: 

230 + 166 lo g2 (A/2 + 1) = 

135 + 249 log2(A/2 + 1) (14) 

Solving Equation 14 yields A = 2.72 cm. So, the point- 
select method is faster than the STROKE-THROUGH 
method only for amplitudes less than 2.72 cm. 
Certainly, this is a minority of cases. Other possibilities 
could be explored too, such as increasing the size of the 
trashcan icon relative to file icons. 

Future Possibilities 

In the hard science-soft science debate, Newell and 
Card (1985) hold that “striving to develop a theory that 
does task analysis by calculation is the key to hardening 
the science” (p. 237). Indeed, future applications of 
Fitts’ law may include “embedded models” as an 
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Figure 8. Screen layout for Fitts’ law example. 
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integral part of user interface design and management 
systems. The scenario envisioned goes something like 
this: A user interface is patched together in story-board 
fashion with a series of screens (with their associated 
soft buttons, pull-down menus, icons, etc.) and 
interconnecting links. The designer puts the embedded 
model into “analyse mode” and “works” the interface 

— positioning, drawing, selecting, and “doing” a series 
of typical operations. When finished, the embedded 
model furnishes a predicted minimum performance time 
for the tasks (coincident with a nominal or 
programmable error rate). The designer moves, 
changes, or scales objects and screens and requests a 
reanalysis of the same task. 

This is a rough first approximation. At a higher level, 
an embedded model is an “agent” in beta testing, 
monitoring activities and profiling performance. The 
profile catalogs a variety of facets of the interface. For 
example, a “locus of control” could identify frequently 
used screens or high-use regions on a display. In a 
word processing system, for example, depending on the 
implementation for scrolling, a user may work mosdy at 
the bottom of the screen. Knowledge of this may imply 
that a menu bar could be placed at the bottom of the 
display rather than at the top. 

A more powerful embedded model performs sequence 
analysis. How long does it take to get from point A to 
point B through a series of intermediate steps? Or, of 
alternate ways, which is the fastest? Hypermedia 
environments with embedded links facilitate such 
analyses, since an explicit and external state-transition 
description may not be needed. The agent acts on the 
screen definitions and links (as they exist in the 
application) in evaluating alternative or optimal paths. 

An embedded model is more than a software routine 
incorporating Fitts’ law. System and user performance 
constants are needed, similar to those in the Keystroke- 
Level Model (Card, Moran, & Newell, 1980). A 
parametric analysis could identify bottlenecks or 
optimal combinations. For example, decreasing 
pointing time by 10% vs. decreasing user keystroke 
time by 10% may have vastly different effects on 
overall task completion time. If a task can be 
accomplished two ways (e.g., 4 point-select operations 
vs. 20 keystrokes), which is the fastest? A parametric 
analysis could identify cross-over points across settings 
(as shown earlier). The designer could establish ranges 
and weights for parameters, and an agent, armed with 
embedded models, would take it from there. 

Fitts’ law may also participate in user-adaptive systems 

— systems with a human interface which changes to 
accommodate a user’s capabilities and limitations 
(Rouse, 1988). Control systems for air traffic, ground 


traffic, power generation or industrial processes are 
potential instances. One can imagine several human 
operators interacting with a complex system by 
manipulating iconic controls in response to system 
events. As system dynamics change, the demands on 
operators change. Models such as Fitts’ law (and/or the 
Hick-Hyman law) could measure the load on operators 
(in bits/s) or predict their performance. In safety- 
critical settings, it may be possible to systematically 
allocate tasks to workers to maintain set-points of sub- 
maximal performance. 

Human-computer interaction has advanced by leaps and 
bounds in recent years. We can attribute this primarily 
to the improved interfaces advanced through the 
technologies of mouse input and bit-mapped graphic 
output. There is an ongoing and valuable need for the 
prediction and modelling of user activities within such 
environments. As human-machine dialogues evolve 
and become more “direct”, the processes and limitations 
underlying our ability to execute rapid, precise 
movements emerge as performance determinants in 
interactive systems. Powerful models such as Fitts’ law 
can provide vital insight into strategies for optimizing 
performance in a diverse design space. 
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Abstract 

Arcball is an input technique for 3-D computer graphics, 
using a mouse to adjust the spatial orientation of an object. 
In Arcball, human factors and mathematical fundamentals 
come together exceptionally well. Arcball provides con¬ 
sistency between free and constrained rotations using any 
direction as an axis; consistent visual input and feedback; 
kinesthetic agreement between mouse motion and object 
rotation; and consistent interpretation of mouse position. 
Attention to mathematical detail facilitates the tasks of 
users and implementors. Users say that as a general-purpose 
rotation controller Arcball is easier to use than its nearest 
rival, the Virtual Sphere. It is also more powerful, and 
simpler to implement. 

Resume 

Arcball est une interface utilisateur pour un systeme de 
visualisation 3D utilisant la souris pour orienter les objets 
dans 1’espace. Dans Arcball, facteurs humains et concepts 
mathematiques se marient exceptionnellement bien. Arcball 
assure la consistence entre rotations libres et contraintes 
quelle que soit la direction de leur axe. Consistence entre 
Tentree et la reponse; accord kinesique entre mouvements de 
la souris et rotations de Tobjet; interpretation consistente 
des positions de la souris. Les utilisateurs affirment que le 
controleur universel de rotation Arcball est plus simple & 
utiliser que son concurent direct: «the Virtual Sphere». II est 
aussi plus puissant et plus simple h implementer. 

Key Words: object movement, view movement, mouse, 
user interface, interactive graphics, 3D graphics, rotation, 
orientation, human factors, quaternion 

Introduction 

In computer simulated scenes, most objects are manipulated 
as if they were rigid, even when they are rubber balls or 
robot arms. (Arms are divided into linked rigid pieces.) 
When one point of a rigid body is fixed by a translation, the 
remaining variability in position is called the body’s 
orientation. Arcball is an input technique which allows 
users to adjust orientation using a mouse. The design is 
unusual in that it considers mathematical fundamentals as 
well as human factors to address a difficult problem. 


Although we find it easy to pick up a small object in one 
hand, and turn it this way and that to examine it, it has 
proved much harder to devise a mouse interface which feels 
nearly as natural. One problem is that changes in orien¬ 
tation can be made in three independent directions—for 
example, a rotation about a left-right x axis, about an up- 
down y axis, or about an in-out z axis. In contrast, a mouse 
can move in only two independent directions. 

A deeper problem is the curved geometry of orientation 
space, which is quite different from the flat space of mouse 
movements. For example, a 360° rotation leaves the 
orientation unchanged, while a 180° z rotation gives the 
same orientation as a 180° x rotation followed by a 180° y 
rotation. The first case would be like pushing the mouse 
straight forward and finding it back where it started, while 
the second would be like raising the mouse in the air by 
pushing it forward and to the side. 

Moreover, closed loops of mouse motion may not produce 
the closed loops of rotation one expects. Rotate an object 
by +90° a\ +90° y, -90° x % and then -90° y, and it is not 
back as it started, but off by 120°. This “hysteresis” effect is 
not inevitable, and would not be tolerated in a translation 
controller. The point is not that such behavior precludes 
undoing a complete drag sequence, but that it is unforgiving 
during dragging. Yet only Arcball avoids hysteresis, by a 
more careful mapping of mouse input to rotation. 

Alternative mouse input techniques were recently evaluated 
in [Chen 88]. These include simulating sliders or treadmills; 
selecting a coordinate axis (by menu or by mouse button) 
then dragging; approximating a trackball; and so on. All 
separate rotations into independent .r, y, and z angles (at 
least internally), yet psychological studies show that 
mental models of rotation do not [Carlton 90]. The virtual 
trackball of [Hultquist 90] computes an instantaneous 
rotation axis directly, but still exhibits hysteresis. 

To avoid hysteresis, we must begin with the mathematical 
fundamentals. From those, we are immediately led to the 
basic Arcball design. Then we will see how to augment free 
rotation with a constraint mode. After pausing to admire an 
elegant quaternion implementation, we end by evaluating 
Arcball’s success as a general-purpose rotation controller. 
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Mathematical Fundamentals 

Any orientation of a rigid body can be given by a single 
rotation, a turn about some axis, starting from an agreed- 
upon reference orientation. Furthermore, the combination 
of any number of rotations can be given by a single rotation 
[Euler 1752] [Goldstein 80]. 

The combination law takes different forms, depending on 
the rotation parameters used. Behind the various formulae, 
however, is simple spherical geometry. A rotation R about 
axis a by amount 0 can be represented on a sphere as any 
directed arc of length 1 1 2 0 in the plane perpendicular to a, 
with positive angles giving a counter-clockwise direction 
around a. (See Fig. 1.) When the end of the first arc is made 
to coincide with the beginning of the second, two sides of a 
spherical triangle are formed. The arc completing the tri¬ 
angle, from the beginning of the first arc to the end of the 
second, represents a single rotation which has the same 
effect as performing R\ followed by Ri. (See Fig. 2.) Since 
rotations do not commute, when the order of combination is 
reversed, a different arc results. Otherwise, this is like 
vector addition. 

Consider the example given earlier of 180° rotations about 
x and v giving a 180° rotation around r; this is represented 
by a 90° arc from the north pole to the equator, followed by 
a 90° arc along the equator. The third leg of the triangle is 
indeed a 90° arc, and represents a 180° r rotation. That this 
is the correct result is easy to verify using a physical object. 
Furthermore, half-length arcs were essential, as full-length 
arcs would have predicted a result of no rotation. 

The hysteresis example is more complicated, for we must 
slide arcs around on their great circles in order to add them. 
First we go from 45° north down to the equator, then 45° 
east. We then slide the diagonal result arc back by its 
length, so it ends where we began. We are half done. Now we 
add a 45° arc to the north pole, giving a result arc also going 
due north. We slide this down so it ends on the equator, then 
add a final 45° arc going west. The outcome of all this is a 
60° diagonal arc, representing a 120° rotation around the 
axis (1,-1,-1). This is not at all a simple closed path, but it 
is how the rotations truly combine. 


From these two examples we can see how previous rotation 
controllers go astray mathematically. Most do not use any 
kind of spherical model, and those that do rotate “physical’' 
spheres through full-length arcs. Unlike translations, rota¬ 
tions do not admit a choice of “C:D ratio” in this aspect of 
their model; only half-length arcs are permitted. 

The mathematical and physical spheres are similar in that, 
while arc lengths differ, their planes or axes correspond. 
While the length problem is surprising, and perhaps con¬ 
fusing, it is also unavoidable. (How many kids have thought 
adding fractions would be much simpler if we could just add 
numerators and denominators separately?) For an excellent 
discussion of the relevant mathematical and physical 
theory, see Chapter 4 of [Biedenham 81]. 

Arcball 

Arcball takes its design—and its name—directly from the 
mathematics. Consequently, rotations can be displayed, as 
well as input, in a graphically meaningful way. 

Suppose some object is selected on the screen. To change 
its orientation, the user simply draws an arc on the screen 
projection of a sphere. The arc is a great arc specified by its 
initial and final points, which are given by the mouse down 
and up positions. (See Fig. 3.) As the mouse is being 
dragged, its current position is used to compute the arc. Thus 
the object—or a faster-drawn stand-in—turns with the 
mouse. The direction and amount of turn are those of the 
half-length arc model described above. 

A great arc—the shortest spherical path between two 
points—lies in the plane containing the two points and the 
center of the sphere. If the end points lie exactly opposite 
each other (which for Arcball can only happen if they lie on 
the sphere silhouette), the plane of the arc is not defined. In 
this application, however, that doesn’t matter; opposite 
points imply a 360° rotation, which leaves the orientation 
unchanged. In effect, opposite points on the circle are “the 
same point.” 
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This permits a useful feature: If the mouse is dragged out of 
the circle at some point the arc can be made to reenter at— 
wrap around to—the opposite point. (See Fig. 4.) Although 
the arc end jumps across the circle, the orientation being 
controlled changes smoothly and naturally. There is thus no 
limit on the amount of rotation. 

We give the user graphical feedback as the mouse is dragged, 
by drawing a “rubber-band” arc from the initial point to the 
current point, and updating a picture of the scene to show 
the rotation of the body or camera. The sphere and rubber- 
band arcs can be displayed separately from the scene image, 
but preferably are drawn transparently on top of it. 

Constrained Rotation 

Spatial manipulations can be complex, and often are easier 
with some constraints [Bier 86]. With Arcball, fixed axis 
constraints are a natural extension. Axes can be chosen 
from sets of any size, based on the object, the environment, 
the view, or other sources. Natural axes to allow include the 
view coordinate axes, the selected object’s model space 
coordinate axes, world space coordinate axes, normals and 
edges of surfaces, and joint axes of articulated models (such 
as robot arms). 

Remember that every arc lies in a plane perpendicular to its 
rotation axis. If the two mouse points on the sphere are 
orthogonally projected onto a fixed plane through the 
sphere center then radially projected back onto the sphere, 
they will necessarily become points giving a rotation about 
the fixed axis perpendicular to that plane. (Fig. 5 illustrates 
a body --axis constraint.) 

A method must be provided to ask for constraint, and to 
select the axis to use. Many choice mechanisms, such as 
menus, are possible, but one is particularly attractive. 
Superimpose on the sphere a great circle (the front half) for 
each axis from a limited set. Thus for body coordinate axes, 
three mutually perpendicular arcs would be drawn, tilted with 
the object. When the mouse is clicked down to initiate a 
rotation, the constraint axis selected—and the only one 
shown—will be that of the nearest arc. So that the user 
doesn’t have to guess, we dynamically highlight the nearest 


arc as the mouse is moved around prior to clicking. Mouse, 
menu, or keyboard combinations can be used to select 
among axis sets (e.g., SHIFT-click for camera axes, CTRL- 
click for body axes, unmodified click for no constraints). 


body 

x-axis 



Figure 5. Constrained z rotation 
Implementation 

All the code for Arcball is here. The sphere can be indicated 
by drawing its silhouette circle. To transform cursor coor¬ 
dinates on the screen into a point on the sphere, the center 
of the circle is subtracted from the cursor coordinates giving 
a radial vector, which is divided by the radius of the circle to 
give two of the coordinates on the unit sphere. If the cursor 
lies outside the circle, that is easily corrected now. The third 
sphere coordinate is obtained as the quantity which makes 
the sum of the squares I. Specifically, let the cursor screen 
coordinates be screen. x and screen. y, let the center of 
the circle be at center . x and center . y, and let the radius 
on the screen be radius. Then the coordinates on the 
sphere are given by 
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pt.x <— (screen.x - center.x)/radius; 
pt.y <— (screen.y - center.y)/radius; 
r <— pt.x*pt.x + pt.y*pt.y; 

IF r > 1.0 

THEN s <- 1.0/Sqrt[r]; 
pt.x <— s*pt.x; 
pt.y <r- s*pt.y; 
pt.z 4- 0.0; 

ELSE pt.z i — Sqrt[1.0 - r]; 

When a constraint axis is being used, the sphere point is 
projected onto the perpendicular plane, flipped to the front 
hemisphere if necessary, and renormalized before being 
used. If the point lies on the axis, an arbitrary point on the 
plane must be chosen. Flipping exploits the wrap effect. 

dot <— V3_Dot[pt, axis]; 

proj 4- V3_Sub[pt, V3_Scale[axis, dot]]; 

norm <— V3_Mag[proj]; 

IF norm > 0 

THEN s <— 1.0/norm; 

IF proj.z < 0 THEN s 4- -s; 
pt <— V3_Scale[proj, s]; 

ELSE IF axis.z =1.0 

THEN pt <r- [1.0, 0.0, 0.0]; 

ELSE pt <— V3_Unit[[-axis.y, axis.x, 0]]; 

Incidentally, to Find the closest arc simply constrain with 
each axis in turn, and pick the one that gives the nearest 
point. The nearest constrained point will have the largest 
dot product with the free point. 

Having obtained the initial and final end points by this 
means, the rotation in unit quaternion form [Shoemake 85] 
is the product of the final point times the conjugate of the 
initial point: q =Pi/>o*- Essential quaternion facts are: (1) a 
unit quaternion q = (v, w] = [a\ y, z, vr] consists of a scalar w 
which is cos 0/2 (where 0 is the rotation angle), and a vector 
v which is sin 912 times a unit vector along the rotation 
axis; and (2) the product of two quaternions gives the 
combination of the rotations they represent, and is non- 
commutative. The formula given amounts to setting the 
quaternion vector to the cross product of the initial and final 
points, and the quaternion scalar to their dot product. No 
trigonometric functions are required, only simple 
arithmetic. 

[q.x, q.y, q.z] <- V3_Cross[pO, pi]; 
q.w <— V3__Dot [p0, pi]; 

The new orientation of the rigid body is given by the 
product of the quaternion for the orientation when dragging 
started with the quaternion we’ve just derived from the user’s 
mouse input: 

qnow 4- QuatMul[q, qstart]; 

This product uses only 16 multiplies and 12 adds, and the 
result can be converted to a matrix at about the same cost 


[Shoemake 89]. If a graphics pipeline with 4x4 matrix 
multiply is available, accumulation and conversion can be 
done at essentially no cost [Shoemake 92J. 

From these quaternion formulae we can verify that spherical 
triangles behave as described earlier. Arc lengths will be 
half the rotation angle, since we rotate by twice the inverse 
cosine of the dot product, which is cosine of the arc length. 
Also, the axis of rotation is taken from the cross product, 
which is perpendicular to the vectors from the center to the 
end points. To go from p 0 top\, we use P\Pq*< and we 
continue from p\ to p 2 using p 2 P\*- The combined effect is 
given by the product P2p\* P\Po** which for these unit 
quaternions is equivalent to p 2 pi _1 P\Po~* - P 2 P 0 *' 
corresponding to the third leg of the triangle. We can also 
see that opposite points are equivalent, since -q gives the 
same rotation as q . 

For many purposes, the unit quaternion is most convenient, 
however a quaternion can easily be converted to other forms 
[Shoemake 85], Although modem systems use quaternions 
already, an Arcball implementation can certainly be done 
without them. The essential step will still be to compute the 
dot product and cross product of the arc endpoints. 

A few observations may help simplify the arc drawing code. 
We can approximate an arc with N line segments if we know 
an endpoint and a point 1/A/ of the arc length away, by 
reflection. Reflect the first point across the second, then the 
second across the third, and so on. If 8 is the dot product of 

Pt and p /+A , then p,+ 2 a = 28p, +A - Pt- We 8 just once ’ 
then ignore the z coordinates. Given the endpoints, po and 
and the arc length, Q. = cos -1 PqP\> we can compute the 
extra point as (p () sin (AM) QfN + p\ sin Q/A/)/sin Q. An arc 

for constraint axis a = [.v, v, z] is split in two. If s = V l-z 2 
is non-zero, the endpoints will be [-vz/.f, -yz/s, s) and 
±[y/s , -x/Sy 0]; otherwise the “arc” is the sphere silhouette. 

Orientation graphing 

To go the other way, from a unit quaternion to a pair of 
points on the sphere, first pick an initial point—say a 
point on the sphere edge—perpendicular to the quaternion 
vector; then obtain a final point as the product of the 
quaternion times the initial point. The following suffices. 

s SqRt[q.x*q.x + q.y*q.y]; 

IF s = 0.0 

THEN pO 4- [0.0, 1.0, 0.0] 

ELSE pO 4- [-q.y/s f q.x/s, 0.0]; 
pl.x <— q.w*p0.x - q.z*p0.y; 

pl.y q.w*p0.y + q.z*p0.x; 

pl.z <— q.x*p0.y - q.y*p0.x; 

Also, if desired, the initial rim point can be negated when 
that would give a shorter arc: 

IF q.w < 0 

THEN pO <— [-pO.x, -p0.y f 0.0]; 
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Evaluation 

At this point in the paper, it’s a good bet that the graphics 
programmers and mathematicians are happy, but the readers 
who go to SIGCHI conferences are wondering what became 
of the user in this user interface. Their concern is legitimate, 
for as (Gentner 90] observed, “good engineering practice 
can lead to poor user interfaces.” 

A good user interface is quickly learned and easily remem¬ 
bered; gets the task done quickly and with few errors; and is 
attractive to users [Foley 90]. These goals are more likely to 
be met if the design uses simple and natural interactions in 
“the user’s language,” requires little memorization, provides 
feedback and shortcuts, and is consistent—both with itself 
and other interfaces [Nielsen 90]. 

How well does Arcball meet these criteria? We can make a 
preliminary assessment before looking at user tests. It is 
manifestly simple to use. There are no multiple sliders, 
knobs, or mouse buttons. Selecting constraints is as simple 
as holding down a key and clicking on an arc. 

Dragging on a sphere is fairly natural. The object turns the 
same direction the mouse moves, for a natural kinesthetic 
correspondence. We can repeat an orientation by repeating a 
position, as holding physical objects leads us to expect. On 
the other hand, half-length arcs are probably not in “the 
user's language,” so are a potential problem. 

Users must remember to hold down a key to get constraints, 
but they never have to remember which direction is object, 
world, or view a\ y or r. The interface itself quickly reminds 
users of how it behaves. (As for remembering constraints, it 
may be helpful to add a persistent constraint mode, toggled 
with a menu item.) 

Arcball is rich in feedback—more so than most controllers. 
The object rotates for a sense of direct manipulation, and a 
rubber-band arc shows the net effect of a drag. Constraint 
mode is signaled both by muscle tension [Buxton 86], and a 
visual display of arc choices. The arc to be selected is high¬ 
lighted, for further constraint feedback. The orientation of 
the object can be graphed as yet another feedback arc. It is, 
of course, possible to augment Arcball with numeric output 
(and input) when that is meaningful. 

Arcball hardly needs shortcuts, but there are three. First, the 
availability of constaints can be considered a shortcut for 
times when they are desirable. (Constraints can also be used 
as “training wheels.”) Second, wrap-around makes large 
rotations easier; but notice that half-length arcs already let 
the user rotate by 360° around any axis with a single drag. 
Third, dragging outside the circle is an easy way imitate the 
use of a screen normal constraint axis. Angle detents can be 
added to make, say, 15° angle multiples easier. 

Consistency is one of Arcball’s strongest features. Object 
motion is consistent with mouse motion, as noted, and lack 
of hysteresis is a very powerful kind of consistency. Also, 
at any time within any drag, mouse movement between the 
same points will always turn the object exactly the same 


way. Arcs will be drawn for dragging, for constraints, and 
for object orientation; all are interpreted within a consistent 
context. Some interfaces have “trouble spots,” such as 
gimbal lock, at certain angles; Arcball has none. Interfaces 
like the Virtual Sphere that depend on incremental mouse 
movements can behave badly if sampling is slow or 
coordinates are noisy; Arcball will not. There is little 
difference between use of the mouse in constraint mode and 
free mode, so Arcball can be a general-purpose controller. 

The Arcball consistency of constrained rotation with free 
rotation means users need learn only one interface for many 
purposes. Consider the manipulation of a manikin arm with 
rotary' joints [Badler 86]. The shoulder has a ball joint and 
so can rotate freely, while the elbow can bend around only 
one axis. With other interfaces, a different style of input 
might be required for each joint. A thumbwheel is a common 
choice for the elbow, but using three wheels for the shoulder 
is awkward. Even for the elbow, the relationship of wheel 
motion to spatial motion may change when the shoulder is 
rotated, so a horizontal stroke of the wheel corresponds to a 
diagonal bend of the elbow. Arcball eliminates all these 
difficulties, and simplifies new possibilites. For example, 
the foot can be forced to pivot on a sloping floor without 
raising the heel or toe, simply by constraining its rotation 
to be around the normal to the floor. 

User interfaces cannot be evaluated adequately on paper, yet 
empirical measures of advantage are often hard to obtain. 
Even the study in [Chen 88] failed to find significant 
performance differences between two very different 
controllers, and only concluded the superiority of the 
Virtual Sphere over the method of [Evans 91] based on user 
comments. Yet informal evaluation should not be quickly 
dismissed, as studies have shown it can be informative 
[Nielsen 90]. 

Informal tests of Arcball suggest that its visual feedback 
provides important cues for understanding its behavior, that 
it is valuable to have both free and constrained modes, and 
that eliminating hysteresis is helpful. Arcball rotates ob¬ 
jects twice as far as might be expected, yet few users realized 
that; when they did, it was not a problem. A number of users 
were slow learning how to rotate around the screen normal. 
This confusion was possible because they were given no 
initial hint about how Arcball worked, in order to learn as 
much as possible about their expectations and responses. 
Nevertheless, without exception, they quickly learned to 
use, and like, Arcball. Many described it as “a sort of 
trackball.” Trackballs usually have only two degrees of 
freedom, which may explain the difficulty mentioned. 

Further comment on visual feedback is in order. Unlike 
other controllers, Arcball can draw a meaningful rubber band 
arc while dragging, but it was not clear whether it was worth 
cluttering the picture. In fact, users appreciated the arc, and 
noted that it was a strong cue to the spherical nature of the 
controller. The circular silhouette of the controller sphere 
might also be eliminated; however without it, Arcball is 
much more difficult to use. Finally, a constraint axis can be 
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chosen as the nearest arc from some set; not surprisingly, it 
is better to draw the arc choices, and to highlight the closest 
one before the mouse is clicked to begin dragging. 

When specifically compared to Chen’s Macintosh demo of 
the Virtual Sphere, Arcball was the clear favorite.^ 
Hysteresis may be hard to describe in writing, but users 
easily noticed the different feel when they tried the 
controllers together, and preferred the feel of Arcball. The 
Virtual Sphere had a bothersome modal distinction between 
drags that started inside the circle and those that started 
outside; Arcball did not. Use of the shift key (consistent 
with some other Macintosh interfaces) invoked a limited 
constraint mode for the Virtual Sphere, but there was no 
visual feedback, and users found its behavior hard to 
understand. Some users were slow to notice the Arcball arc 
highlighting (a more distinct color should probably be 
used), but since only the selected arc remained visible when 
dragging began, they could see their mistakes. In this area, 
too, Arcball was preferred. As mentioned earlier, half-length 
arcs were unexpected, but not annoying. Their benefits of 
wrap-around, greater range of motion, arc feedback, and 
hysteresis elimination seemed more important than their 
lack of “physicality.” 

Conclusions 

Arcball is an elegant application of mathematical theory to 
interface design. Its behavior and its implementation are 
clean and simple. We can perform both free rotation and 
constrained rotation. In either case, the direction of mouse 
motion corresponds to the direction of object rotation. Lack 
of hysteresis makes Arcball more forgiving than other 
rotation controllers, since incremental motions are easily 
undone. Use of half-length arcs brings this and other 
benefits, without seeming unnatural. More user studies are 
needed, but mathematically, at least, Arcball is likely to be 
the best general-purpose rotation controller using a mouse. 

That said, there is a bigger picture. Arcball only controls 
rotation, and typical 3-D interactions certainly require 
translation [Houde 92] and possibly scaling as well—and 
along arbitrary axes [Shoemake 92]. Studies indicate 
important differences between manipulating objects and 
manipulating views [Ware 90]. For the latter, egocentric 
controllers [Mackinlay 90] may be more satisfactory. Users 
may feel that consistency of controllers should encompass 
all three transformations, or may decide that specific tasks 
warrant custom rotation controllers. Ultimately, users will 
choose the interfaces that serve them best. 

Since a single mouse position has only two degrees of 
freedom, a pair of positions—the ends of an arc—are used. 
This part of Arcball has wider applicability, including a 
translation controller to be described in a future paper. 


^ Readers who would like lo make the comparison for themselves can get a copy of the Arcball 
demo for the Macintosh by anonymous ftp to ftp.cis.upcnn.edu. directory /pub/graphics. 
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Abstract 

Although video has been used for many years to record data, 
few tools have been developed to help analyze video data. 
Current multimedia interfaces have severe cognitive and 
attentional limitations, reflecting technology-centred 
designs which have not profited from human factors theory 
and user-centred design. This paper discusses user require¬ 
ments for video analysis, from which we derive a set of 
functional specifications. These specifications are useful 
for evaluating existing systems and for guiding the devel¬ 
opment of new systems. A number of existing systems are 
briefly described, as is the VANNA Video ANNotation and 
Analysis system, which integrates video, non-speech 
audio, voice, textual and graphical data, and which incorpo¬ 
rates emerging technology, user-centred design and human 
factors theory. 

Keywords 

Video annotation, video analysis, analysis methodologies, 
usability testing, multimedia. 

Introduction 

Video is a vivid and compelling way of presenting informa¬ 
tion, of highlighting interesting experimental findings, 
and of illustrating unique concepts. It provides a highly- 
detailed, permanent record which can be analyzed in many 
ways to extract a variety of different types of information. 
These benefits have long been recognized in usability test¬ 
ing, training and education (e.g., Anacona, 1974; Dranov, 
Moore, and Hickey, 1980; Ramey, 1989; Nielsen, 1990). 
New applications in video mail (e.g., Buxton and Moran, 
1991), interactive multimedia systems (e.g., Ishii, 1990; 
Mantei, Baecker, Sellen, Buxton, Milligan, and Wellman, 
1991), behavioral research and computer supported cooper¬ 
ative work (e.g., Greif, 1988; Baecker, 1992) are driving 
an increasing demand for better tools for handling and ana¬ 
lyzing video. 

Consider the following example. In one typical video 
analysis application experimenters video taped two sub¬ 
jects in a collaborative programming task. The subjects 
communicated using a video link between their two distant 


locations. Experimenters were interested in noting when 
subjects were looking at the computer monitor and when 
they were looking at the video monitor. They additionally 
wanted information about whether subjects looked or did 
not look at the video image of their partner at times when 
they communicated. Finally, experimenters wanted to 
know what types of information were communicated and 
passed between the two subjects using the video link (e.g., 
diagrams, pointing to parts of the computer screen or man¬ 
ual, normal conversational gestures). The data resulting 
from such an analysis might include gaze information as it 
related to conversation, and data about information 
content. 

For video analysis tools to succeed they must support 
a wide variety of tasks and analysis styles while still easily 
capturing essential data. We need to gain new insights into 
the way people work with multimedia systems. Little is 
known about creating new classes of interface which 
manipulate information having temporal dependencies. 
We need to apply proven design methodologies, human- 
computer interaction and human factors theory. 

The objective of this paper is to provide the reader 
with an understanding of design issues for video analysis 
systems, using the VANNA system as an illustrative 
implementation of one such system. The first section of 
this paper describes user requirements for video analysis, 
enabling us to derive a set of functional specifications for 
building video analysis tools. These specifications have 
been applied to evaluate existing “landmark” tools and 
notation systems (e.g., Rein, 1990; Losada and 
Markovitch, 1990; Potel and Sayre, 1976; Roschelle, Pea, 
and Trigg, 1990) and suggest guidelines for the 
development of new multimedia tools. We then describe 
the VANNA system, which reflects these guidelines and 
illustrates a number of unique interface design approaches. 
Tests results for the VANNA system are presented and 
design implications are discussed. 

User Requirements 

Our intent is to provide a multimedia video analysis tool 
which is easily customized to address the characteristics of 
the task, the application and the user's personalized style. 
To achieve this task analyses were performed for multiple 


Graphics Interface ’92 



158 


users within single application domains and across many 
different application domains (e.g., usability testing, 
CSCW, behavioral studies). We also conducted literature 
reviews and surveys, examined existing systems used for 
video analysis, and interviewed users of these systems to 
determine which functionality the systems had in common, 
which functions were most frequently and least often used, 
and what the common complaints were. (See Harrison and 
Baecker, 1991; Harrison, 1991 for detailed discussions of 
the systems examined.) A summary of the most important 
results of this work is presented here. 

From the task analysis, we derived two key points 
related to the process of manipulating a video document. 
Users tend to work with video in one of two ways: annota¬ 
tion and detailed analysis . Annotation implies “note tak¬ 
ing.” Here users are attempting to capture data in real¬ 
time, in highly personalized and abbreviated ways. The 
annotation task is characterized by high cognitive and 
attentional demands. Detailed analysis typically occurs 
after the real-time annotation and does not have the same 
real-time constraints. In this case the user may make many 
passes over a given segment of tape in order to capture ver¬ 
bal transcriptions (protocol analyses), behavioral interac¬ 
tions, gestural or non-verbal information. As part of this 
detailed analysis, users may also wish to run statistical 
analysis, or summarize data in tables or graphs. 

Based on the user interviews and surveys of existing 
systems, we derived a set of user requirements which sup¬ 
port both the annotation and the detailed analysis process. 
These were grouped into four categories: coding the data, 
analyzing and interpreting the data, user interface and 
device control, and displaying the data. The coding cate¬ 
gory represents methods for entering the various forms of 
annotational and analysis data. Elements in the analysis 
and interpretation category are those which related to 
manipulating pre-recorded data, in order to form conclu¬ 
sions about the nature of the data. The user interface and 
device control category embodies some general principles 
for building user interfaces of video annotation and analy¬ 
sis systems. Finally, when displaying the data , there are 
several general requirements to guide presentation formats 
and capabilities. 

Coding the Data 

There are two kinds of coding: real-time or on-line coding 
which occurs during annotation, and off-line coding, which 
occurs during analysis. On-line or real-time coding may be 
thought of as a subset of the overall coding process, where 
the video may be viewed playing forward at normal speed 
only, with no opportunity for review. A restricted set of 
functions is used, which reflects the real-time constraints 
and high attentional demands. These capabilities allow 
the user to mark events (typically with a single button 
press, mouse click, or keyboard stroke), and allow entry of 
very short text comments. The user must be able to: 

• mark the occurrence of an event 

• mark the start and stop points for intervals. 

The system must be able to: 

• capture keystrokes 


• capture subject’s computer screen. 

The “off-line” coding process requires a more compre¬ 
hensive set of functions. This stage is characterized by the 
high usage of speed control and reviewing capabilities, and 
permits the user to perform detailed coding operations of 
data including: 

• user comments or general observations 

• verbal transcriptions of the conversation 

• non-verbal information, e.g., gestures 

• personality or mood measures. 

Our current findings indicate that user comments are 
typically fairly concise (estimated at less than 200 charac¬ 
ters for text, or 2-3 Figures for graphics). Verbal transcrip¬ 
tions of the conversation may be word-for-word transcrip¬ 
tions or might simply record specific spoken keywords. 
Non-verbal or gestural information may be described in a 
number of ways, including sketches, special coding 
schemes or symbols, and may even be embedded in the 
conversational record. This information is the most diffi¬ 
cult to represent and is therefore most often subject to 
encoding schemes, as demonstrated by some of the current 
notations (e.g., Heath, 1986, 1990). Personality measures 
or mood assessments are often also ranked and encoded. 
Most “mood” coding schemes in current tools are based on 
the Bales SYMLOG system for studying small group 
dynamics (Bales and Cohen, 1979). Several mood notation 
systems currently exist though none integrate video 
directly into the analysis tool (e.g., Losada and 
Markovitch, 1990; Rein, 1990). 

Analyzing and Interpreting the Data 

Once the video has been coded, any number of analyses 
may be applied to the data. The level of analysis is depen¬ 
dent upon the experiment objectives, hypotheses and 
experimental design, but some general capabilities are 
summarized below: 

• play “next” event 

• play “previous” event 

• group events 

• play entire group 

• play loopback i.e., play the same sequence over 
many times 

• keyword searching for text data 

• basic quantitative data — frequencies, averages, 
durations, variances 

• time series or interaction analyses 

• data exporting 

• merging of data for interjudge reliability. 

Interaction patterns play a significant role in many 
analyses. These patterns can be derived by statistical 
means, by time series analysis or by approximation 
through visual inspection of carefully formatted output. 
This last case provides a more simplistic view of the inter¬ 
actions by summarizing data on adjacent, aligned time 
lines. This allows users to visually inspect the data for 
recurring patterns, overlaps and gaps in interaction. Time 
lines facilitate the observation of process information (as 
opposed to content information). 
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User Interface and Device Control 

The user requirements and the attentional demands of. video 
analysis have direct implications for both the user interface 
and the mechanisms for device control. Critical interface 
issues include integration of the video images, device con¬ 
trols, and tool functionality, use of both auditory and 
visual feedback cues, consistency in the interface, and user- 
definable screen layouts. Technology issues include the 
degree of a user's control over the video devices and the 
choice of input devices. 

Integration of the “video monitor” with the annota¬ 
tion system on a single screen is crucial because video 
requires continuous visual attention; important events 
might be missed in a split second. This continuous moni¬ 
toring is required since one can neither predict the fre¬ 
quency of event occurrences, nor the modality for the event 
(i.e., in which channel the event will occur: auditory or 
visual). Additionally, spatially separated displays (as are 
prevalent with existing systems, e.g., Losada and 
Markov itch, 1990; Roschelle, Pea, and Trigg, 1990) pre¬ 
vent simultaneous access to multiple visual sources. Users 
must direct their visual attention away from the critical 
data source in order to locate and select functions in the 
analysis tool. The resulting visual scan time (and effort) 
between displays is unacceptable for analysis tasks, in par¬ 
ticular for real-time annotation. Finally, integration of 
displays solves problems with work space size limitations 
vs. equipment size requirements. 

The annotation system should have both auditory and 
visual feedback mechanisms. If the user is analyzing visual 
data the auditory feedback cues from the system would be 
used and vice-versa. This minimizes interference between 
system feedback and the primary task of analyzing the 
video data. Visual channels are typically differentiated in 
terms of spatial separation (i.e., different locations in the 
visual field). Auditory cues are differentiated by pitch, 
loudness, and tonal characteristics. The auditory cues 
should be non-speech to avoid confusion with the voice 
track of the video document. 

In order to successfully record detailed events, com¬ 
ments and information, the user requires automatic control 
of many of the video speeds from within the analysis tool. 
The minimum speed control requirements are: 

• high speed, e.g., fast forward 

• regular playing speed 

• frame by frame 

• paused at any single frame. 

Forward and reverse motion options should be appli¬ 
cable to any of these speeds. 

The tool must be capable of coding at a variety of tem¬ 
poral “resolutions”. This allows events to be coded at a 
variety of rates, such as: 

• every frame 

• every second 

• every minute 

• at random intervals. 


If multiple tapes are used for recording, users may need 
to automatically cue up any or all of the tapes relative to 
the position of a single tape. 

The coding process, and in particular the real-time 
annotation, has implications for the style of interface and 
the input devices chosen. Users need to access the various 
capabilities of the tool with interfaces which have low 
visual attentional demands. The kinds of mechanisms 
might include button presses, touch typing, the ability to 
point directly to the monitor using a touch screen or draw 
directly using a stylus. It may be desirable for interface 
mechanisms and graphic annotations to be overlaid 
directly on top of the video. The interface should avoid 
secondary monitors and graphics which require fine motor 
coordination for function selection. Additionally the 
mechanisms for representing the data should not require 
complex encoding schemes or cognitive mappings, but 
rather should favour a direct one-to-one mapping between 
the concept to be representing and the interface object (and 
corresponding label) used to capture the item. 

Critical in the usability of any tool is the ability to 
customize the screen. Users must be able to add or delete 
instances of objects to reflect their current analysis needs. 
This includes the ability to modify the tool in an ad-hoc 
manner during an analysis session. Providing a library of 
functions from which the users can “copy” and “paste” 
interface objects and subsequently resize and relocate them 
on the screen is one method of achieving this. 

When reviewing previously recorded data, users must 
be able easily to play back the previous item, the next 
item, and user-defined groups of items, independent of their 
actual location in the data file. 

Displaying the Data 

The presentation of data and results is perhaps one of the 
most under-developed aspects of current tools. A minimum 
requirement is the ability to print a copy of all data 
recorded. This is basically a “dump” of a log file, contain¬ 
ing reference time codes or frame locations, comments, 
transcriptions, diagrams, event markers, interval markers 
and keystroke logs. (Many tools do not extend their 
“output” capabilities much beyond this.) The result is a 
complex listing of data which is usually so dense that 
interpretation is difficult. The implication of this is the 
need for a variety of views or summaries of the results. 
This includes numerical analyses, time line representations 
and graphical plots. The user should be able to specify 
whether the analysis results are presented by experimental 
subject, by topic discussed, by artifact usage or by other 
criteria. 

Most existing tools present results and data in either 
tabular format or on a text-based time line (either horizon¬ 
tal or vertical). The use of colour and animation con¬ 
tributes greatly to the clarity and effectiveness of presenta¬ 
tion. These techniques have been greatly underutilized thus 
far. 

Often users wish to present short video segments 
which highlight interesting findings or which provide 
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good representations of general trends in the data. In order 
to achieve this, they need to be able to indicate the starting 
and stopping points for a number of sequences, and the 
order in which the sequences are to be played back. These 
sequences may be existing intervals marked in the coding 
stage, or they may be new sequences. The order of presen¬ 
tation of sequences should not be dependent upon the video 
recording medium, i.e., the tape media must support non¬ 
sequential playback. 

Functional Specifications 

From the user requirements a number of functional specifi¬ 
cations may be inferred. These have specific implications 
for tool functionality and for the user interfaces to video 
analysis systems. 

Coding the Data 

1. User-specified indexing of the video tape. 

Users may mark an event or an interval by indicating the 
starting (and stopping) position, typically by a button 
press. Still frames may also be used as index markers. 
These events or intervals can later be retrieved under com¬ 
puter control. 

2. Grouping of events or activities . 

Users can group similar events or activities together in 
user-specified classes and assign a unique index to each 
class. 

3. Experimenter observations or comments. 

Users can enter textual or possibly graphical comments, 
notes and observations. These are linked automatically to 
the appropriate segments of video. 

4. Verbal transcript analysis and keyword indexing. 

Users can enter conversational transcriptions of the audio 
track. Statistics may be computed on the keywords. 

5. Individual and group characteristics . 

Users can enter subjective assessment data for various mea¬ 
sures of personality and group dynamics such as Bales mea¬ 
sures (Bales, 1950). 

6. Non-verbal and gestural information. 

Users can enter data related to the observed gestural pat¬ 
terns, for corresponding video frames or segments. This 
information may be coded using a variety of notations, 
including symbolic notations such as Labanotation 
(Laban, 1956; Hutchinson, 1954). 

Analyzing and Interpreting the Data 

7. Keyword searching . 

Users can use keyword searches on any text data, including 
experimenter comments or verbal transcriptions. 

8. Keystroke and computer screen integration . 

If the subjects are required to use a computer, their 
keystrokes and/or computer screen is recorded and syn¬ 
chronized with the video. 

9. Access to text editors , statistics packages , graphics 
packages t plotting packages . 

Users can import/export data to/from other software pack¬ 
ages. 

10. Analysis for interaction patterns over time. 


Users can analyze the data by examining patterns in the 
occurrence of events or activities over time. Significant 
(frequent) patterns and interactions are highlighted on a 
time line. 

11. Support for inter judge reliability. 

A means of merging multiples codings should be available 
to support multiple judges and hence improve reliability of 
the data and subsequent analysis. 

User Interface and Device Control 

12. Digital control access to basic video functions. 

Users can stop, start, fast forward, and rewind video tape(s) 
and control the playback speed directly from the analysis 
tool. 

13. Retrieval and playback of previous and next indexed 
items. 

Based on the currently selected item or the current location 
on the tape, users can elect to play the next or previous 
item recorded. 

14. Retrieval and playback of sets of items using automatic 
indexing. 

Users can request that all events or activities belonging to 
a given class be played in sequence automatically. 

15. Direct manipulation interface. 

Users access the various capabilities of the tool using 
interfaces which have low visual attentional demands. The 
kinds of mechanisms might include button presses, touch 
typing, the ability to point directly to the monitor using a 
touch screen or draw directly using a stylus. It may be 
desirable for interface mechanisms and graphic annotations 
to be overlaid on top of the video. The interface should 
avoid secondary monitors and graphics which require fine 
motor coordination for function selection. 

16. Simplified mental models. 

The users should have minimal mappings and coding 
schemes to represent events, activities and attributes. 

17. Ability to customize annotation screens. 

Users have access to a “library” of functions, from which a 
subset may be chosen and laid out on a screen to form user- 
definable interfaces. Users can relocate and resize any 
object on the screen. 

18. Multi-media or hypermedia analysis record. 

The final analysis record consists of text, audio, and video 
is integrated into a single multi-media document. 

19. Automatic synchronizing mechanism for multiple 
tapes. 

If multiple tapes are used for recording, users can automati¬ 
cally cue up any or all of the tapes based on the position of 
a single tape. 

Displaying the Data 

20. Customizable presentation and summarizing capabili¬ 
ties. 

The number and levels of mappings and coding schemes are 
minimized to facilitate interpretation of data and results. 
Results should be presented in user-defined categories. 
Users specify which items to include or exclude in each 
summary view. Several standard views are provided. 
Textual and graphical formats are both available. 

21. Time line display of events. 
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Users can view the occurrence of events and duration of 
activities on time lines. Users can specify the number of 
time lines and the basis on which they are defined (e.g., per 
subject, per task, per medium) 

22. Animated and colour displays . 

Animation and movement patterns can be used to illustrate 
dynamics and capture the temporal dimension of behaviour. 
Colour can be used to distinguish and highlight variables 
or interesting results. 

23. Presentation of video segments . 

Users can mark video segments which illustrate relevant or 
interesting examples and produce an “edit list”. This edit 
list can be easily played back in any sequence. 

State of the Art 

At the time of this research, there were several interesting 
video analysis systems in existence, each providing a 
unique contribution to the field. The tools described may 
be divided into two categories: notation systems and video 
analysis tools. Notation systems are methods of represent¬ 
ing information extracted from video tape, though they 
may not be directly linked to or control the video itself. 
Video analysis tools control the video and integrate func¬ 
tionality such as described in the previous section. 

Notation Systems 

The Heath Notation System is one of the few encoding 
schemes which directly integrates detailed information 
about non-verbal communication and gestures with the cor¬ 
responding verbal transcripts (Heath, 1986; Heath, 1990). 
Typed punctuation symbols (e.g., —.[ , ]), rep¬ 

resent non-verbal events and activities. Information about 
intonation, speaking volume and speech characteristics is 
embedded directly into the transcript, while gestures, gaze 
and other non-speech information is represented above 
each line of verbal transcript This analysis method facili¬ 
tates observation of the interrelations between non-verbal 
communication patterns and verbal conversation, although 
the encoding scheme is complicated and makes accurate 
keyword searches difficult 

The Mood Meter system is a graphical notation sys¬ 
tem which is based on the Bales SYMLOG dimensions 
(Bales and Cohen, 1979) and which describes human inter¬ 
action and mood over time (Rein, 1990; Olson and 
Storrosten, 1990). The participants’ “mood” ratings are 
aggregated into a single group score, which is represented 
diagrammatically by concentric circles or stars of varying 
colour and density. The idea is to represent divergent 
groups by dispersed images and convergent groups by con¬ 
centrated images. One drawback of this tool is the reliance 
on cognitive mapping schemes for encoding participation 
and group mood. The mood data require interpretation to 
convert them to the Bales dimensions, followed by transla¬ 
tion to descriptive terms. Additionally, the mood diagrams 
reflect the aggregate group mood, making interpretation on 
an individual level difficult. 


Analysis Tools 

GALATEA is “an interactive animated graphics system for 
analyzing dynamic phenomena [notably biological cell 
movement] recorded on a movie film” (Potel and Sayre, 
1976; Potel, Sayre and MacKay, 1980). In this system 
computer graphics or animated images are superimposed 
directly on the film. Users have the ability to “write 
directly” on the film with a digitizing stylus over the video 
screen image, giving Galatea a unique and truly direct 
manipulation interface. This input mechanism allows free 
hand drawing, data point entry, or handwritten notes. 
There is also a substantial easy-to-use button interface to 
the video controls which resides on the same monitor as 
the video image, though is not visible simultaneously. 

The Group Analyzer is one of the most sophisticated 
tools for representing group dynamics (mood) over time 
(Losada and Markovitch, 1990; Losada, Sanchez, and 
Noble, 1990). The presentation capabilities of this tool 
are exceptionally good, taking advantage of colour, anima¬ 
tion and time series analysis. The analysis component 
allows users to display both static and dynamic (animated) 
displays of the results in “field diagrams.” Users may dis¬ 
play an animation demonstrating how the group dynamics 
evolve and change over time (the dominance circles for 
each participant expand and contract). The field diagram 
may be used to reference the actual video tape. Entering the 
data requires training, however, since the coding forms are 
complex and make extensive use of cognitive mappings 
(per the Bales dimensions). Much of the coding is done in 
real time by trained experimenters while they are observing 
subjects. 

VideoNoter is a tool which allows users to create 
annotations and verbal transcriptions which automatically 
index either a video tape or disc (Roschelle, Pea, and Trigg, 
1990; Trigg, 1989). Annotations may be either textual or 
graphical, and can be composed and subsequently imported 
from external packages. VideoNoter's particular strength is 
the interface to the video control functions. It is also one 
of the few analysis tools which allows users to easily cus¬ 
tomize their own annotation screen. Users may define their 
own button oriented coding template for marking events of 
interest and may reorder the columns by select-and-drag 
operations. The function of each column is user-definable. 
Automatic control of video functions are accessible implic¬ 
itly through the worksheet and scroll bars, or explicitly 
through a menu bar. User have had to rely heavily on tex¬ 
tual entry of data from a keyboard directly into columns in 
the data file. This time-consuming data coding has 
restricted the use of this system. 

EVA is an interactive video annotation tool which 
allows both “on-line real-time” coding while the experi¬ 
ment is running, and “off-line” detailed coding with the 
stored video record after the experiment is competed. 
(MacKay, 1989; MacKay and Davenport, 1989). It allows 
experimenters to enter notes, verbal transcriptions, 
keystroke logging for the subjects, and symbolic cate¬ 
gories for organizing the recorded data. One interesting 
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capability of this tool allows the text transcriptions to 
appear as “subtitles”, synchronized with the video. One 
subtitle appears for each participant. Another facility, not 
seen in other tools, is the ability to automatically log 
keystrokes from the subjects and synchronize them with 
the video. These can be presented in a manner similar to 
the “transcription subtitles”. 

U-Test is a tool developed expressly for usability test¬ 
ing and is fairly representative of many usability testing 
systems (Kennedy, 1989). The emphasis is on “real-time 
on-line” coding. The tool is pre-programmed, by the 
experimenter, with a list of tasks that the subjects are to 
perform. For each task, a detailed list of steps and a set of 
anticipated possible errors which the subject might make 
are given. These “error buttons” may be considered spe¬ 
cific cases of experimenter-created event indices used to 
automatically index to specific points on the video tapes. 
A “timer on” and “timer off* function allows experimenters 
to set start and stop points for intervals of interest. 
Experimenters may also enter text comments and observa¬ 
tions. These are linked to the video tapes and may be used 
as indices to specific points on the tapes. The U-Test tool 
also provides the experimenter with reminders about when 
they must perform a certain action with regards to the 
experiment itself. 

One of the few tools designed specifically for inexpe¬ 
rienced users is Xerox EuxoPARC’S “virtual VCR” (Buxton 
and Moran, 1990). A graphical image of a VCR control 
panel is presented on the user's computer screen. This 
panel contains all of the standard control functions. In 
addition to the control functions, the users may “mark” the 
tape with indices and associated comments (in much the 
same manner as proposed earlier in this paper). Comments 
are restricted to short one line titles or notes. These “tags” 
have a designated start and stop point and a GOTO function 
for playback. 

These systems each address different problems in 
video analysis. They, and the VANN A system described 
below, are summarized in Table 1 (shown at the end of this 
paper) using the functional specifications described earlier. 

The VANNA System 

The VANNA (Video ANNotation and Analysis) system 
integrates, on a Macintosh, various multimedia elements 
into a single video document. User interfaces for the sys¬ 
tem were created using brainstorming sessions, iterative 
design, and rapid prototyping, resulting in many versions 
of the system over a short period of time (approximately 8 
versions in 4 months). We used direct manipulation inter¬ 
faces to support a number of important features described 
later in this paper. Additionally, we designed the system to 
support a variety of input devices including a touch screen, 
digital stylus, mouse, and keyboard. The system supports 
both real-time annotation and the detailed analysis of video 
data. 


Coding the Data 

Users define their own index markers by duplicating index 
buttons and assigning each a unique name. A single button 
press immediately creates an index label and links it to the 
corresponding location in the video document. These 
indices are used to capture the occurrence of important 
events in the video and can also be labelled to reflect rank¬ 
ings of behavioral data such as mood and mood changes. 
Similarly, to capture events having durations, a special 
index type called an interval is used, indicated using a 
start/stop button or switch. Users may define any number 
of indices or intervals. A typical coding screen is shown as 
Figure 1. 

Textual comments may be entered either alone or in 
conjunction with an index or interval. The comment win¬ 
dow is variable length, scrollable and editable. Verbal 
transcriptions may be thought of as a special case of com¬ 
menting and are therefore entered in a similar manner. 
Brief comments of less than 20 characters are typically 
used for real-time annotation, while lengthy paragraphs 
and verbal protocols are entered in the more detailed analy¬ 
sis stages. Currently comments may be explicitly linked 
to any event or interval using a special “link” button. 

Analyzing and Interpreting the Data 

All annotations are recorded in a log file, with each item 
type recorded in a different column (e.g., time, indices and 
intervals, comments). A typical VANNA log file is shown 
as Figure 2. The log file may be viewed, sorted by any col¬ 
umn, edited, searched and played back. Keyword searching 
and sorting are provided for indices, intervals, comments 
and transcriptions. Items may be played back by selecting 
(and hence highlighting) the desired item and pressing the 
“play” button. Simple built-in data analysis routines calcu¬ 
late the frequency of occurrence for each index label and 
interval, the average and cumulative durations for each 
interval, and the variability in duration for each interval. 
For more detailed statistical analysis or graphical plotting, 
data may be exported to an external package. 

User Interface and Device Control 

The VANNA system overview is shown as Figure 3 (at the 
end of this paper). VANNA simplifies technology access 
through software interfaces which send and receive video 
device commands through the Macintosh serial ports. 
Regardless of video device (e.g., VHS VCR, 8mm VCR, 
camcorder, video disc) the user sees the same iconic video 
button controls. Icons are based on the standard video 
controls found on the devices themselves. 

The video image, which may be generated by a video 
card or by a software solution such as Apple’s QuickTime®, 
appears as a window in the computer monitor. Users may 
magnify the any portion of the image (zoom in), may shift 
views of the image (pan), or may pull back from the image 
(zoom out). This results in a complete integration of 
video, audio and computer tools, solving many of the prob¬ 
lems outlined earlier. The video device(s) may reside in a 
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different room and act as a server(s) to many users. In fact, 
multiple users may alternately control the same device 
when working collaboratively. This integration also min¬ 
imizes work space size requirements (i.e., only the com¬ 
puter workstation is required). 

Once created, selected items or groups of items can be 
played back automatically. (Items may be intervals, 
indices, comments, or graphics.) The annotation system 
finds the appropriate location in the video document and 
begins the playback sequence. This provides users with the 
ability to easily create “edit lists” of video segments for 
presentation, based on a number of user defined criteria. 

The interface for entering annotations has been 
designed to reduce both perceptual and cognitive load. 
VANNA provides users with several default templates or 
screen layouts and a dictionary of functions. Users may 
add, delete, resize or relocate any object on a template, 
including the video window, by directly manipulating the 
objects themselves (e.g., cut, copy, paste, drag). Only the 
functions deemed necessary by the user are presented. This 
creates a completely customizable system. 

Multiple input devices are supported simultaneously 
(touch screen, stylus, mouse, keyboard, video shuttle speed 
dial). The users may rapidly switch between input devices 
as appropriate. A shuttle speed dial is available for con¬ 
trolling the video direction and speed. This allows users to 
take advantage of two-handed input techniques and parallel 
manipulation strategies (Buxton and Myers, 1986) by con¬ 
trolling the video speed with one hand and pressing but¬ 
tons with the other. This is particularly useful for detailed 
analyses when reviewing one segment of video many times 
at varying speeds to capture information. These devices, 
combined with the user definable screen layout ability, 
support both right and left handed subjects equally well. 

Any button press provides the user with both auditory 
and visual feedback. Buttons temporarily inverse high¬ 
light and a brief tone sounds (a “clicking” sound like a 
mechanical button). Different pitches distinguish indices 
from intervals (in addition to a different visual appearance). 
The graphics of the interval button changes to differentiate 
between open and closed intervals. Error tones are louder, 
with different classes of errors being distinguished by both 
pitch and tone. Only critical errors display messages, min¬ 
imizing interference with the primary annotation task. 

Displaying the Data 

Currently, the system produces reports which display data 
items in columns. Users define the number of columns and 
the column content. Columns typically contain the time, 
index label or interval label, user comments, and verbal 
transcriptions. The entire contents of the data file may be 
printed or users may elect to filter the data and print out a 
pre-determined subset. Additionally, users may view or 
print out reports using the simple statistics and frequency 
counters built into the system. Built-in statistics currently 
include frequency of occurrence for each index and each 
interval, and the cumulative duration, average duration and 
variability in duration for each intervals. 


User Testing 

The VANNA system is undergoing extensive usability test¬ 
ing with a variety of real tasks and applications. These 
include user interface testing for experiments with pie 
menus, behavioral studies of writing strategies in joint 
authoring, studies of gaze patterns for video usage in col¬ 
laborative programming, usability testing scenarios for 
several complex devices, studies of human error in kinesi¬ 
ology, and naturally the analysis of video analysis ses¬ 
sions using the VANNA system. A number of interesting 
results have been observed thus far. 

Users rapidly adopted personalized analysis styles, 
with some favouring real-time annotation and frequent 
“loopbacks” and others favouring analysis in slow motion 
with few loopbacks. The button style index markers 
worked very well and the performance was good for both 
real-time and non-real-time analysis. In both the real-time 
and the non-real-time analysis processes users tended to 
enter comments which were less than 20 characters, though 
the comments input field was variable length to allow for 
much longer text items. 

Users entered data in sequences of “grouped” button 
presses. These groupings reflected the data characteristics; 
some types of events frequently occur in rapid succession. 
Users physically layed-out the coding screens to reflect 
these data characteristics by clustering related buttons 
together. As the data characteristics changed over time, 
users dynamically altered the screen layout to reflect these 
changes. 

Most users adopted an off-line data coding strategy of 
entering about 10 items which they then reviewed in the 
log file for correctness. They would immediately make 
changes if necessary and then return to the coding process 
and enter another 10 items. This process resulted in many 
users requesting a automatically scrolling window view of 
the most recently entered data as part of the coding screen. 
By merging a brief view with the coding screen, users felt 
that their revision and coding process would be simplified. 
This additionally provides users with feedback about what 
has been recorded in the data log file. 

Comments could be used as data items in themselves or 
could be descriptors for index markers by explicitly 
“linking” them. This latter case required the use of a “link” 
button which has proved to be problematic. Users tended 
to forget to press this link button when they wished to 
associate the comment with a specific index marker. 
Changing the size, location, and label of the link button 
did not correct the difficulty. A better mechanism for 
achieving this functionality is needed. 

Keyword searching was used extensively in the play¬ 
back process especially on experimenter comments. The 
ability to sort by index names and then playback many 
items from same group was also found to be very useful and 
was used extensively. 

The touch screen was not used extensively but this can 
be primarily attributed to the angle of the screen, which 
was found to be too tiring for long sessions. (Most ses¬ 
sions lasted at least 1 hour with an average time of about 2 
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hours). For future use the touchscreen needs to be mounted 
in a recessed surface at about a 35 degree angle. 

Although video zoom was provided the performance 
was too slow. Users wanted to rapidly magnify and later de- 
magnify portions of the video image. Better video tech¬ 
nologies now make this possible. 

Summary and Conclusions 

The VANNA system was designed by applying proven 
methodologies in HCI and human factors theories. It illus¬ 
trates one method of achieving a cost-effective and useful 
desktop video annotation and analysis system. 
Preliminary results in user testing indicate that the VANNA 
system is suitable for a number of applications for a num¬ 
ber of users. Insights from user testing have encouraged us 
to contemplate a number of extensions. 

For example, we have recently implemented a portable 
version of the annotation subsystem which runs on a lap¬ 
top computer, the PowerBook* Portable Vanna can be 
taken into the field and allows real-time annotation to 
occur simultaneously with data capture. 

Graphical overlay capabilities are under implementa¬ 
tion. This will allow users to draw sketches using a stylus 
directly over the video image. We believe that this will 
prove a useful mechanism for capturing non-verbal and ges¬ 
tural data in behavioral analyses. 

VANNA will also be linked to an automatic audio 
tracking system (Sellen, 1992). This system separates, 
logs, and graphically plots over time audio contributions 
from up to four meeting participants. This facilitates the 
analysis of speech patterns such as pauses, interruptions, 
and simultaneous speech. 

Finally, we are currently implementing graphical time 
line displays and are investigating the use of color displays 
and animation for more vivid presentations of results. 
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Abstract 

We describe a new data parallel algorithm for raytrac¬ 
ing. Load balancing is achieved through the use of pro¬ 
cessor allocation, which continually remaps available re¬ 
sources. In this manner heterogeneous data bases are 
handled without the usual problems of low resource us¬ 
age. The proposed approach adapts well to both ex¬ 
tremes: a small number of rays and a large database; 
a large number of rays and a small database. The algo¬ 
rithm scales linearly—over a wide range—in the number 
of rays and available processors. We present an imple¬ 
mentation on the Connection Machine CM2 system and 
provide timings. 

Resume 

Cet article presente un nouvel algorithme par allele pour 
le lancer de rayons. IS allocation des processeurs , qui 
distribue la tache aux resources disponibles, permet de 
garder une charge bien repartie. Ainsi evitons nous les 
problemes usuels dus aux ressources de bas niveau tout 
en manipulant des structures de donnees heterogenes. 
Notre approche s’applique aussi bien a un faible nombre 
de rayons et des donnees importantes qu’i un nombre 
important de lancer de rayons sur des donnees de taille 
limitee. L’algorithme fonctionne en temps lineaire en le 
nombre de rayons et de processeurs disponibles. Nous 
presentons une implementation sur le systeme Connec¬ 
tion Machine CM2 et donnons des temps de calcul. 

Keywords: Massively parallel, SIMD, ray tracing. 

Introduction 

With the advent of massively parallel computers, many 
algorithms which require large computing resources have 
been developed to take advantage of these new ma¬ 
chines [2; 20]. While the use of parallel computers, like 
the Connection Machine CM2 system, has become almost 
common place in computational fluid dynamics and QCD 
theories [6], only relatively few graphics algorithms for 
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general purpose parallel computers have been published. 
We hypothesize that this is due in large measure to the 
extraordinary success and availability of cheap special 
purpose graphics hardware. 

Most of the work on this hardware (see for example [14; 
25; 1; 22; 27]) has focused on polygon scanline render¬ 
ing. At the same time, the development in rendering 
algorithms has been towards more sophisticated global 
shading models. Unfortunately, these can only make lim¬ 
ited use of commonly available graphics hardware (e.g. 
the use of z-buffers for radiosity and raytracing accelera¬ 
tion [9; 31]). There has also been some interest in the use 
of distributed processing and multi-processing, to accel¬ 
erate radiosity computations [28; 3], and ray tracing [16; 
26; 12]. As the computational needs of sophisticated 
global illumination models continue to increase and the 
size of data sets continues to grow (e.g. volume render¬ 
ing) the importance of algorithms for scalable, general 
purpose computers will continue to increase. 

Applications such as volume rendering, ray tracing, and 
radiosity have computational needs large enough to take 
advantage of highly parallel architectures. While, for ex¬ 
ample, ray tracing appears to parallelize trivially, the 
published research shows otherwise (see below). Many of 
the published algorithms struggle with the load balanc¬ 
ing issue which becomes worse with increasing numbers 
of processors. There are also examples—from volume 
ray tracing—where two different algorithms, which are 
equivalent on serial machines, can have radically different 
run times on parallel architectures [30], The Achilles heel 
of most graphics algorithms on parallel architectures is 
their high demand for general, versus regular, communi¬ 
cation. As the following review of previous work shows, 
the main focus in this area of research is on minimizing 
general communication. 

In the next section we discuss in detail previous work 
in parallel raytracing. Following that we describe our 
new algorithm and present timing results, finishing off 
with conclusions. The details of the implementation are 
described in the appendix. 

Previous work in parallel ray tracing 

Most previous work concerns the mapping of ray tracing 
onto MIMD hypercubes. Carter and Teague [8] discuss a 
simple scheme of replicating the entire database at every 
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node of an iPSC/2 computer. The speedup factor gained 
by the use of many processors is almost exactly the num¬ 
ber of processors in the system. The only issue is the 
assignment of pixels to processors. For this Carter and 
Teague use the obvious “pixel mod number of processors” 
com6 assignment (this same approach was also chosen 
by [24] for a ray tracer on the Pixel machine). Salmon 
and Goldsmith [29] consider this scheme as well. They 
go on to consider the more general case of databases that 
are too large to be replicated at every node. They use 
a bounding hierarchy data structure for their database 
and replicate only the top n levels of this structure at ev¬ 
ery node. In this way the more frequent intersection tests 
against the top of the hierarchy can be performed locally. 
The sub trees of level k > n are distributed across the 
nodes. Considering the statistics of intersection of rays 
against subtree nodes of the bounding hierarchy, they de¬ 
velop a procedure to determine the height of the subtrees 
which are deposited across processors. 

Another decomposition scheme using a bounding hier¬ 
archy was explored in Carter and Teague [7]. The en¬ 
tire bounding hierarchy tree, except for the leaf nodes, 
is replicated in all processors. The leaf nodes are dis¬ 
tributed across the processors. An LRU cache is used 
to maintain a set of primitives at each processor. Using 
a blocking scheme to distribute pixels across processors 
they naturally take advantage of the implied spatial co¬ 
herence. Load balancing is achieved by a master process 
which allocates pixel blocks to idle processors. 

A different set of algorithms uses SIMD architectures, 
such as the Connection Machine CM2 system, for data 
parallel ray tracing. While some of the design issues are 
similar, for example keeping global communication to a 
minimum, others are quite different. In data parallel 
algorithms, there is only one thread of execution, and 
load balancing takes on a new meaning. Instead of dis¬ 
tributing possibly different parts of the algorithm across 
processors, load balancing aims to keep the set of active 
processors as large as possible for every part of the algo¬ 
rithm. 

Delany [11] considers the case of a bounding hierarchy 
and its efficient traversal. The tree structure is mapped 
onto processors using the numbering of a preorder traver¬ 
sal of the tree [4]. Assuming that the bounding volumes 
are the same as the objects to be ray traced, nodes and 
leaves are of the same type. As is the case, for example, 
in sphere databases. The algorithm assigns each ray to 
a processor and at every iteration uses general commu¬ 
nication to fetch the next bounding volume as the rays 
traverse the tree. Using the Euler tour numbering it be¬ 
comes simple for each ray to find the processor address 
of the next tree node to intersect with. 

Delany has also described a ray tracer based on a space 
subdivision scheme [10]. In this algorithm all objects and 
rays receive a tag, indicating which leaf node in a uni¬ 
form octree partitioning of space they currently reside in. 
The least significant bit of the tag is used to differentiate 
between rays and objects. The algorithm proceeds by 
sorting rays and objects based on their tags. As a conse¬ 
quence, rays end up immediately adjacent (in processor 


space) to objects that they need to intersect with. Using 
the scan communication primitives 1 object data can be 
efficiently propagated to all rays which occupy the same 
voxel as the given object. By doing backward and for¬ 
ward scans on the bits of the tags in the sorted list of rays 
and objects, it is possible for a ray to compute the next 
non-empty voxel along its direction to within a power of 
2. In this way rays are advanced to the next parent voxel 
in the octree that contains another object and the clos¬ 
est intersection can be found in logarithmic time in the 
length of the ray. 

Discussion 

The MIMD hypercube algorithms are generally charac¬ 
terized by their difficulties with load balancing. For ex¬ 
ample Carter and Teague [8] replicate the entire database 
at every node. This leaves the distribution of rays to even 
out the load. For large numbers of rays—small numbers 
of processors in the system—the simple comb assignment 
does well. As the number of rays per processor decreases, 
though, the load imbalance goes up markedly. Further¬ 
more we do not consider the replication of the entire 
database at every node feasible as this limits the size of 
databases and leads to ever increasing waste of memory 
as the number of processors increases. Salmon and Gold¬ 
smith [29] distribute their database across the processors 
and use a static analysis to determine how best to achieve 
this. The actual performance numbers they give indi¬ 
cate even higher load imbalance for increasing numbers 
of processors (almost 40%). This is not surprising since 
the work estimates, on which the decomposition is based, 
exhibit large variance for small numbers of rays per pro¬ 
cessor; as would be the case with increasing numbers of 
processors. These concerns also apply to the different 
distribution scheme of Carter and Teague in [7]. Since 
their distribution approach adjusts dynamically, through 
the use of an LRU cache, it exhibits somewhat better 
load balancing behavior. Both approaches assume that 
communication is very expensive and explicitly maintain 
multiple processes at each node to mask the idle time 
associated with communication. Since these algorithms 
exist in a MIMD message passing context they easily deal 
with databases containing arbitrary mixtures of object 
types. 

The concerns of the SIMD algorithms are typically cen¬ 
tered around the fact that all processors need to execute 
in lockstep. While this simplifies program development 
greatly, it leaves these algorithms very vulnerable to low 
resource usage in the face of multiple object types. In 
particular, Delany’s algorithm [11] is even more restric¬ 
tive in that the bounding objects must be of the same 
type as the primitive objects themselves. Some flexi¬ 
bility in the object types is afforded by using a class 
of objects for which ray/object intersections can be ex- 

1 Briefly, scan operations execute an operator such as 
sum across ail ordered set of elements. These could be 
in a vector with a vector processor executing the instruc¬ 
tion along the length of the vector, or an ordered set of 

I processors with each element in its own processor. In the 
atter case the scan instructions execute in logarithmic 
time in the length of the set [4]. 
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pressed in terms of a common meta type (e.g. quadric 
primitives, with planes as degenerate quadrics, etc.). In 
this algorithm the top of the tree is, at least initially, a 
communications bottleneck. During the initial steps of 
the algorithm all rays attempt to fetch data from the 
processor which holds the root node. This causes high 
congestion in the router and results in slow execution 
of the general communication step. As the algorithm 
progresses this problem is alleviated since the rays start 
distributing themselves across all nodes of the hierar¬ 
chy and new rays are added at different times. Notice 
that in algorithms of this kind coherence can actually 
be disadvantageous as it implies high congestion in gen¬ 
eral communication patterns. A more serious problem 
with Delany *s algorithm is the fact that regardless of the 
object types, terminal and non-terminal intersections re¬ 
quire different treatment. Some processors, for example, 
may need to evaluate the shading model, while others 
need to execute further intersection tests with the hier¬ 
archy. Delany addresses this with the use of a finite state 
automaton which always goes to the state with the most 
processors waiting. Klietz [23] describes the use of these 
ideas in a framework allowing multiple object types—all 
of which are sub types of a common meta type—and the 
automatic object hierarchy construction as described by 
Goldsmith and Salmon [17]. Klietz reports linear scaling 
of the algorithm in the number of objects in the hierar¬ 
chy up to approximately 10000 (on a 32k processor CM2 
system), beyond which the performance of the general 
communication falls off markedly. 

The other algorithm proposed by Delany [10] exploits the 
ability of the CM2 to sort very fast [5]. It however also 
requires a finite state automaton for load balancing the 
states, e.g. need to intersect } need to advance , etc. At each 
step the algorithm enters into that state which has the 
most processors waiting. For multiple object types the 
number of states and the potential for each state having 
only a few active processors in it increases, potentially 
resulting in low resource usage overall. 

The above analysis shows that it is imperative to con¬ 
sider load balancing from the onset and assure that it 
scales with increasing numbers of processors. The goal of 
our work was to develop an algorithm which scales with 
the available resources—number of processors—as well 
as the problem size. In particular given that there are 
typically at least hundreds of thousands of rays it should 
be possible to take advantage of very large numbers of 
processors. All the MIMD algorithms discussed above 
assume small numbers of processors (< 1024). While 
this is not explicit in the design, it can be seen from the 
analyses. As the number of processors goes up the load 
imbalance increases. The data parallel ray tracing algo¬ 
rithms scale much better with the number of processors, 
but suffer—as do the MIMD—algorithms from unwieldy 
load balancing procedures. Even though load balancing 
is addressed load imbalance can still be very high [29]. 
These issues are furthermore exacerbated in the SIMD 
case with the use of different object primitives. 

In our algorithm we address the load balancing issue, as 
well as the issues connected with varying object types 


in the same database through the use of processor al¬ 
location. The basic observation is that even for small 
numbers of objects—say, a few triangles among many 
spheres in the same database—there are typically more 
rays potentially intersecting these few objects, than there 
are processors in the machine. Hence we can keep the 
available resources busy if we simply loop over the object 
types. This requires the algorithm to continually remap 
the resource usage. To be sure, this remapping requires 
general communication, but if the amount of computa¬ 
tion between remapping steps is significant, the cost of 
the general communication steps can be amortized. In 
the performance section below we will argue that the cur¬ 
rent algorithm achieves this. 

A new data parallel ray tracing algorithm 

A simple and straightforward method of data parallel 
ray/object intersection would intersect every object with 
every ray in a single intersection step and then extract 
the closest intersection. This is not only very wasteful 
but also impractical since it amounts to computing the 
full cross product of all rays and objects. For a database 
of 2 13 objects, and 256x256 pixels/rays we would require 
2 29 processors each executing one ray/object intersection 
in parallel. The algorithm we propose takes advantage of 
the fact that this cross product can be made sparse by 
observing that most rays can only polentially intersect a 
small fraction of the entire object database. The raytrac- 
ing literature is full of algorithms which use, for example, 
bounding hierarchies, to exploit this sparsity. 

The main task then is to develop an effective strategy to 
derive small lists of candidate objects for each ray and 
only intersect each ray with the objects on the respective 
list. Ideally this list contains only those objects which are 
actually along the ray. In order to find the list of candi¬ 
date objects for every ray we use a coarse space subdi¬ 
vision of the object database. Specifically we divide the 
world into equal sized voxels which are coordinate axes 
aligned. In a preprocessing phase [15], we find all objects 
that overlap a given voxel and maintain a list of these for 
each voxel. During the ray casting step all voxels that 
a given ray pierces are enumerated and used to access 
the per-voxel lists of objects. All objects retrieved in this 
way are candidates and are intersected in parallel with 
the given ray. A downward min scan on the computed 
intersection distance gives the closest object intersected 
by a given ray. 

Our technique of generating candidate sets is not unlike 
the shaft culling technique introduced recently by Haines 
and Wallace [19]. They consider ray casting in the con¬ 
text of radiosity form factor estimation. Candidate lists 
are comprised of all objects overlapping the the convex 
hull of an origin and destination object. This technique 
requires that for each ray an origin and destination object 
is known, as is the case in radiosity. For ray tracing this 
does not generally hold. Our technique, while coarser, 
does not depend on knowing such objects and thus also 
works for general ray tracing. Since we only use flat lists 
for each voxel our case corresponds to their open always 
strategy. 
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Figure 1: Each box corresponds to a processor in a 1 
dimensioned processor set. The parallel variable (pvar) 
alloc-size holds the number of processors to allocate. 
The address of each allocated segment is given by the 
sum-scan of alloc-size. The segment-bit pvar delin¬ 
eates the actual segments. 

The details of the SIMD implementation of the algorithm 
can be found in Appendix . However, since processor al¬ 
location is at the heart of most steps in the algorithm 
we will give a brief description of this data parallel pro¬ 
gramming idiom (see also [4]). It is through the use of 
processor allocation that we achieve the desired load bal¬ 
ancing. 


Processor allocation 

Just as serial algorithms use memory allocation to man¬ 
age dynamicadly changing demands, data parallel algo¬ 
rithms use processor allocation for the same ends. Con¬ 
sider, for example, a ray which is to be intersected against 
a candidate list of objects. Each ray, which is stored in its 
own processor, typically needs to be intersected against 
different numbers of candidate objects. In order to ex¬ 
ploit ray/object parallelism each ray-processor allocates 
a number of object-processors. This is accomplished by 
allocating a new processor set with enough processors to 
hold the sum total of requested processors. This new pro¬ 
cessor set is segmented so that each segment consists of 
as many processors as the associated requesting processor 
required (see figure 1). The allocating processors receive 
a pointer (processor address) to the segment allocated to 
them, which can be used to move data between the al¬ 
locating and allocated processors. The segment-bit can 
be used in segmented-scan operations to execute instruc¬ 
tions on a per segment basis. For example, propagating 
[segmented-copy-scan) ray data to all objects which need 
to be intersected with the given ray. Another typical use 
in our framework is the segmented-downtvard-min-scan 
which finds the minimum element of a given pvar on a 
per-segment basis. The result of this scan is left in the 
first processor of each segment, whence it can be retrieved 
easily. 

The processor allocation paradigm provides a general 
way to implement algorithms which dynamically require 
new resources of uneven length. Another attendant ad¬ 
vantage of this approach is the implied load balancing. 


Since each processor allocates as many new processors as 
it needs there are no idle processors in the new processor 
set 2 . 


Load balancing 

When ray tracing in parallel, load balancing is important 
in two places. First the amount of work necessary to in¬ 
tersect a given ray with the database varies from ray to 
ray and second the recursion depth of a given ray tree 
varies based on geometry—rays leaving the scene—and 
object properties—highly specular versus purely diffuse 
objects. We use processor allocation as described above 
to address both of these requirements. For a given set 
of rays we allocate enough processors per ray to hold the 
candidate objects from the voxels along the ray. In this 
way rays that, for example, leave the database will con¬ 
sume less resources than those which are going through 
an area with many objects. Similarly for each recursion 
level only those rays which need to be followed further 
allocate child rays. 

This approach is only feasible however, if the router 
performance is high enough to support the continuous 
remapping of resources during allocation. On vector com¬ 
puters this corresponds to their ability to perform scat¬ 
ter/gather operations efficiently. 

Performance 

We use processor allocation extensively throughout the 
algorithm. Since this involves general communication 
we need to be sensitive to the underlying communica¬ 
tion patterns. Semi-regularity in the routing pattern can 
lead to very high congestion rates since some form of 
regularity often implies that many messages need to go 
through only a few nodes in the communication network. 
In practice, it has been observed that some of the slowest 
general communication patterns are very regular, while 
random patterns tend to yield high throughput. In our 
case this implies that coherence, which is usually welcome 
and exploited, can be very disadvantageous. The loading 
of scene descriptions illustrates this point well. When 
reading in a database the ordering in the file often cor¬ 
relates with spatial coherence in the database. We have 
found in those cases that placing the objects into a lin¬ 
ear processor set in the order read in, yielded a running 
time higher then when assigning objects to bit reversed 
processor addresses. This effect depends on the actual 
database loaded. For the sphere flake with 7381 spheres 
and 1 quadrilateral (see [18]) we found the following rep¬ 
resentative times at a voxel scale of 80 3 : 


Mach. 

size 

linear 

load 

bit rev. 

load 

16k 

208.4 S 

170.8 S 

32k 

118.7 S 

98.3 S 


The granularity of the database voxelization exerts a 
much larger influence on the runtime. Two forces need to 

2 There can of course be idle processors if the total 
number of processors requested is not some integer mul¬ 
tiple of the number of physical processors. 
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Figure 2: Timings for various voxelization scales. A 
bounding box for the entire data base is cut into 10 3 
to 90 3 voxels. Machine size was 16k processors with an 
image size of 128 2 pixels. 



Figure 3: Timings for different machine sizes. All 
databases are 7381 spheres at an image size of 256 2 . 


be balanced here. If the voxelization of the world is very 
fine the candidate lists per ray are fairly small since only 
those objects overlapping a small neighborhood of the ray 
will be listed in the object fetch lists. On the other hand 
a fine voxelization of the world drives up the memory con¬ 
sumption of the system and forces us to subdivide any 
given ray into many and much smaller pieces. Figure 2 
shows timings for two different sphere flake data bases 
(see [18]) with 820 and 7381 spheres respectively. These 
databases are good test cases since they contain both very 
small and very large objects. They also contain a single 
quadrilateral. The timings include 3 recursive reflection 
levels and shading via 3 light sources (shadow feelers to 
all light sources at all intersections). All tests were timed 
on a CM2 running at 7MHz. 

The observed behavior with respect to voxelization scale 
remains for different machine sizes as well. In figure 3 we 
see data for the 7381 sphere database for an image size 
of 256 2 pixels for different machine sizes. When consid¬ 
ering larger images with all other parameters the same 
the runtime scales linearly in the number of rays. 
Considerable speedups can be gained from subdividing 
coarsely along the view direction and finely in the viewing 


plane. The following table gives one such example for the 
same scene as above on a 32k processor machine at an 
image resolution of 256 2 


Voxel 

Time 

Voxel 

Time 

scale 

in S 

scale 

in S 

160 2 x 30 

95.6 

180 2 x 20 

101.6 

160 2 x 20 

90.6 

160 2 x 20 

90.6 

160 2 x 10 

97.7 

140 2 x 20 

95.7 


Using our algorithm on various scenes we have found that 
there always existed an optimal (in the sense of runtime) 
subdivision granularity. Due to memory constraints this 
subdivision can not always be attained. Since in most 
scenes the large majority of rays are view rays it is advis¬ 
able to transform the database such that the view plane 
is orthogonal to one of the coordinate axes. By subdivid¬ 
ing along that coordinate axis coarsely the incidence of 
a ray being intersected multiple times with the same ob¬ 
ject is reduced. At the same time a tight fit—and a small 
candidate object list—is achieved through the fine sub¬ 
division in the image plane. This holds across different 
machine sizes as well (see figure 4). 

Conclusions and future work 

We have developed a data parallel ray/object intersec¬ 
tion algorithm which scales in the number of processors 
as well as in the number of rays and objects over several 
binary orders of magnitude. Through the use of proces¬ 
sor allocation every stage of the algorithm is efficiently 
mapped onto the available resources without requiring 
explicit load balancing steps. Since the algorithm uses a 
large amount of general communication, parameters such 
as subdivision size and ordering of objects in processor 
space need to be tuned to achieve maximum performance. 
In particular we observe that the voxelization should be 
coarse along the view rays and fine parallel to the image 
plane. The algorithm has been incorporated successfully 
into a massively parallel radiosity algorithm [13]. 

In the current implementation the performance is limited, 
since we do not take advantage of any coherence along the 



number of processors 


Figure 4: Runtimes for different, weighted voxelization 
scales. The image size is 256 2 and subdivision in Z is 20 
for all runs while the subdivisions in X and Y are 40/40 
and 160/160 respectively. The database was the same as 
in the other examples. 
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ray. When an object overlaps several voxels along a given 
ray, the ray will be intersected with it multiple times. 
Many unnecessary intersection tests also occur because 
rays are intersected in parallel with all the objects along 
the ray’s length. This implies that there is no notion of 
stopping as soon as the first intersection is found. Cus¬ 
tomary acceleration techniques like shadow hit caches, 
are not currently incorporated into our approach. These 
shortcomings cannot trivially be addressed in a strict 
SIMD framework as it is provided by the CM2 Connec¬ 
tion Machine System but would be straightforward in 
a data parallel MIMD framework as given by the CMS 
computer, or other MIMD machines. In particular the 
intersection tests should progress through the candidate 
lists of each individual voxel in parametric order along 
the ray and finish as soon as an intersection is found. 
The problem of a ray intersecting itself multiple times 
with the same object is not as easy to address. The usual 
technique of depositing a tag with the object consumes 
too much memory in our case since all rays are traced 
in parallel and a possibly very large number of rays can 
attempt to deposit a tag. 

The absolute performance of the current implementation 
is comparable to the best algorithms on the fastest work¬ 
stations. This is mostly due to the lack of optimizations 
in our current implementation. As mentioned above some 
of these could be added, while others would require the 
more flexible framework of a MIMD computer. The al¬ 
gorithm also requires a large amount of general commu¬ 
nication during load balancing. Current communications 
networks do not provide enough bandwidth to keep this 
part of the algorithm from consuming disproportional 
amounts of time. We expect this to change as commu¬ 
nication networks become faster relative to CPU speed. 
The main feature of our algorithm remains its ability to 
scale over a wide range of resources and demands. 
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Implementation details 

Building the voxel overlap lists 

When a database is initially loaded into the system the 
first step is to build the lists which, for a given voxel, 
hold all the names of objects intersecting that voxel. For 
each object type in turn each object allocates enough 
processors to hold all the voxels that its bounding box 
overlaps (see figure 5). Call these cand_voxels. Using 
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Figure 5: A 2D example of objects on a voxel grid and 
the voxels their bounding boxes overlap. These voxels are 
tested against the objects themselves. Only those that 
actually do intersect with the object, generate an entry 
in the object_f etch-list. 
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Figure 6: A 2D example of “chopping” lines to the voxel 
grid along the two dimensions in turn. 


the forward pointers, the object data is sent to the be¬ 
ginning processor in each candidate segment. With a 
segmented-copy-scan this data can be propagated to ev¬ 
ery processor in its segment. At this point we properly 
intersect the candidate voxels with the respective object. 
Every processor that finds an actual intersection sends 
a “1” to the objects-per.voxel accumulator. After the 
intersection step each voxel holds the length of its ob¬ 
ject list in object8-per-voxel. In a second processor 


allocation step these numbers are used to allocate pro¬ 
cessors for each voxel to hold the actual lists of object 
ids. Call this the object-fetch-list. Next we find a 
ranking of the voxel names in cancLvoxels. Notice that 
only those processors for which intersection TRUE 
participate in this ranking. Using the rank as an ad¬ 
dress we can send the object tags to the proper places in 
the object_f etch-list, which concludes the setup. The 
object_fetch_Iist consists of segments, each of which 
corresponds to a particular voxel, that contain ids for 
those objects which actually intersect a given voxel. Each 
voxel now holds the address of the beginning of its seg¬ 
ment in the object_f etch-list as well as the length of 
that segment. We will use these two pieces of data to 
build the candidate lists for each ray. 


Generating ray/object candidate lists 


Given a set of rays we can now build the candidate object 
lists. In order to do this we need to generate a list of 
voxels that every ray pierces. Once we have this list, it is 
an easy matter to use the object-f etch-list to get the 
objects themselves for the ray /object intersection test. 
One way to generate the voxel lists for each ray, would 
be to use 3D voxel conversion of the rays with 26 neighbor 
connectivity (see [21]). This method however is not well 
suited since the rays are in general of different lengths, 
leaving many processors idle, while the longest rays are 
still being voxel converted. Instead we use a nested set 
of 3 processor allocations, in turn “chopping” the rays to 
voxel boundaries along each of the 3 axes (see figure 6 for 
the 2D case). Notice that all children of rays A and B 
are contiguous and ordered along the ray in the final pvar 
(chopped-in-x-and-y-rays). When this step is completed 
each chopped ray simply considers its endpoints to find 
the address of the voxel it crosses_ 
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objects_in_voxel-crossed indicates how many intersec¬ 
tion processors need to be allocated. After this alloca¬ 
tion, the segment addresses and the ray names are prop- 


a gated into the allocated segments _ 

I chopped__in_x_and__y_rays 



segmen t_addr es s_in — obj ec t_f e t ch_l i s t 



Using the segment bit the addresses of all the objects in 
a given segment can be generated by incrementing the 
segment addresses through the length of each segment 
and the ray data is filled in with a segmented-copy-scan 



Ray/object intersection 


With all the variables from the generation of the candi¬ 
date list in hand we now only need an indirection through 
the object-f etch-list to pair up each ray with its can- 



Since the computed ray/object intersections are ordered 
within their respective segments in the parameter value 


along the ray we can execute a down ward- mm -reduce on 
the computed parameter value of the actual intersections 
to find the closest (if any) intersection 



At this point the algorithm returns with the intersected 
object and the parameter along the ray where the inter¬ 
section occurred. 
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Abstract 

A method is presented for exploiting object space temporal 
coherence to speed up ray tracing of animation sequences 
where the camera remains static. The object space is 
subdivided with a hierarchical voxel grid structure. Each 
voxel keeps a list of the rays that pass through it when the 
first frame of a sequence is rendered. To render a successive 
frame, only rays that passed through voxels in which an 
object has moved are retraced. The method speeds up ray 
tracing of a test animation sequence by nearly a factor of 
four. 

The method is easily adapted to work with any spatial 
subdivision technique. The memory requirements of the 
method are low. 

Keywords: Ray Tracing, Frame Coherence, Space 
Subdivision. 

1 Introduction 

Ray tracing [Whitted 80] is an elegant technique for 
synthesizing realistic images. Numerous acceleration 
methods have been developed to reduce the computational 
expense of ray tracing, including spatial subdivision 
[Glassner 84] [Fujimoto 86], hierarchical object extents 
[Rubin 80] [Kay 86], clustering and sweeping methods 
[Amanatides 84] [Heckbert 84] [Shinya 87], and ray 
coherence [Joy 86]. These methods are effective when 
rendering a single image, but do not make use of the 
temporal coherence found in animation sequences. 

The traditional way to render an animation sequence is to 
render each frame one at a time, ignoring any object space 
temporal coherence that may exist between frames. Object 
space temporal coherence manifests itself as objects, such 
as floors or walls, that move slowly or remain static 
throughout the course of an animation. 

The aim of the research presented in this paper is to take 
advantage of object space coherence to speed up ray tracing 
of animation sequences. To be useful for high quality 
rendering, the method must produce images that are 


indistinguishable from those rendered with a traditional 
one frame at a time approach. 

2 Previous Work 

Algorithms for exploiting temporal coherence operate 
either in image space or object space. Image space 
algorithms reduce the time to render an animation sequence 
by rendering a subset of the pixels in a frame, and 
estimating the value of the unrendered pixels. Due to their 
sampling nature, image space algorithms may generate 
"incorrect” frames - frames that differ from those rendered 
with a traditional one frame at a time approach. Object 
space algorithms use information about the 3D object 
space, and how it changes between frames, to reduce the 
amount of computation to render an animation sequence. 

2.1 Image Space Temporal Coherence 

Badt [Badt 88] proposed a method that reduces the number 
of rays traced during an animation by tracing the first frame 
of a sequence normally, then rendering successive frames 
by retracing only a small random sampling of pixels for 
each frame. If a retraced pixel's color differs from its color 
in the preceding frame, a flood fill algorithm that floods 
both in screen space and in time is used to correct its color. 
Flooded pixels are retraced for preceding and succeeding 
frames to determine their correct colors. The flood filling 
reduces, but does not eliminate, the possibility of incorrect 
pixel colors. This method requires that the object space 
description for every frame of the animation sequence be 
available at all times during the rendering. 

Chapman [Chapman 90] developed another image space 
algorithm that traces fewer rays than Badt’s method, but is 
potentially less accurate. The algorithm renders every kih 
frame of a sequence, where Jc >= 1. The pixel colors of 
frame n and frame n+k are compared. If a pixel's color is 
different in the two frames, then it is retraced at frame 
n+k/2. This process is repeated recursively, resulting in a 
binary search that determines the frame in which the pixel's 
color changed. The drawback of this algorithm is that if k 
is chosen to be large, high frequency changes, such as 
those caused by fast moving objects, will be lost. 
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2.2 Object Space Temporal Coherence with a 
Moving Camera 

Hubschman [Hubschman 82] presented a method for 
exploiting object space temporal coherence when 
rendering sequences where only the camera moves. The 
first frame of a sequence is preprocessed to determine object 
visibility, and successive frames are generated by 
determining which objects have changed their visibility 
status. While the technique creates "correct" images, it 
does not work when objects move during the animation. 


Frame 1. Frame 2. 



2.3 4D Ray Tracing 

Spacetime ray tracing [Glassner 88] accelerates ray tracing 
of animation sequences through the use of hierarchies of 4D 
bounding volumes that encompass objects as they move 
through space and time. Rather than building a hierarchy 
of 3D bounding volumes for each frame in the animation, a 
hierarchy of 4D bounding volumes is created once for the 
animation sequence. Rays are represented as a 3D direction 
vector and a fourth component, their position in time. 
Rays are traced by testing them for intersection with the 4D 
bounding volumes in the scene. Only objects that lie 
within the 4D bounding volumes that are intersected by a 
ray need to be tested for intersection with it. 


Legend 

unchanged voxel, 
changed voxel. 

— —ray is not retraced. 

-ray is traced. 

Figure 1. 

reducing the effectiveness of the hierarchical bounding 
volume approach. 



The main source of efficiency in this algorithm is that 
fewer bounding volumes are created for an animation 
sequence than with a traditional 3D bounding volume 
approach. This accelerates both the creation of bounding 
volumes and reduces the number of ray/volume intersection 
tests required to render a sequence. Motion blur by jittering 
rays in time is facilitated since the entire animation 
sequence is available to the Tenderer at each frame. 

One drawback to this approach is that it requires an entire 
animation sequence to be resident in memory during 
rendering. The method is also not amenable to a voxel- 
based spatial subdivision approach. Thirdly, it does not 
reduce the number of rays that need to be traced at each 
frame. 

Chapman, Calvert, and Dill [Chapman 91] developed a 
similar algorithm for using hierarchies of bounding 
volumes of animated objects. The difference with their 
approach is that objects inside the bounding volumes 
represent their motions as translation and rotation vectors. 
The ray/object intersection calculation is extended to 
encompass intersection with a moving object and to 
compute all intersections of a ray with a moving object. 
These intersections are sorted by time and distance along 
the ray. From this information the colors for a ray are 
calculated for the entire sequence, and each ray is traced 
only once for a given sequence. 

Disadvantages to this technique are the complexity of 
intersection calculations and that it may not be readily 
extensible to handle motion that cannot be represented as 
simple translation and rotation vectors. Furthermore, the 
object bounding volumes may become large if an object 
moves significantly during the animation sequence, 


2.4 Object Space Temporal Coherence with 
a Static Camera 

When the camera remains static during an animation 
sequence, the color of a pixel can change from one frame to 
another only if an object that is visible to the pixel has 
changed, or if an object has moved to become visible to the 
pixel. An A-buffer scan conversion Tenderer [Carpenter 84] 
can make use of this by keeping a list, throughout the 
animation, of the surface fragments that lie under each 
pixel. If an object moves between frames, it is deleted from 
the fragment lists of all pixels. The moved object is then 
rescanned into the frame buffer in its new position, and is 
added to the fragment lists of the pixels into which it 
scans. Pixels whose fragment lists have changed are then 
re-evaluated to determine their new color values. 

Sequin presented a ray tracing algorithm that stores the ray 
tree at every pixel so that surface attributes of visible 
objects can be changed without having to retrace the image 
[Sequin 89]. The method fails when objects change then- 
positions, since it cannot determine if the visible surface 
for a ray has changed. 

Murakami and Hirota [Murakami 90] extended the 
algorithm to handle animated objects by subdividing the 
space with a voxel grid, and keeping a list of traversed 
voxels for every ray in a ray tree. To render a subsequent 
frame, all objects that move are deleted from the voxel grid, 
and are reinserted in their new positions. The ray trees are 
then examined and only rays that traversed through voxels 
in which an object has moved are retraced (Figure 1). A 
hashing scheme for representing a ray’s path through the 
voxel space is used to speed up the process of determining 
which rays to retrace. 
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The memory requirements of the Murakami and Hirota 
algorithm are large, typically on the order of tens of 
megabytes, and grow rapidly as image resolution increases. 
Computational requirements also grow as a function of 
image resolution because the ray tree of each pixel must be 
examined to determine whether the ray passed through a 
changed voxel. Their algorithm is also specific to uniform 
voxel subdivision due to its use of a voxel index hashing 
scheme. 

2.5 A New Algorithm 

This paper presents an algorithm for making use of object 
space coherence to speed up ray tracing of animation 
sequences in which the camera remains static. The 
algorithm’s memory requirements are independent of image 
resolution, and it is easily adapted to any spatial 
subdivision scheme such as uniform voxel subdivision, 
octree subdivision, adaptive voxel subdivision, or 5D 
space subdivision [Arvo 87]. 


Lecend 




light source. 

rays directly affected by voxel A. 

rays affected by the shadow of an object in voxel A. 


hi a screen pixels directly affected by voxel A. 
■■ a' screen pixels indirectly affected by voxel A. 


3 The Algorithm 

Rays are tagged with their x t y pixel index in the image 
frame buffer. As rays are traced through a spatially 
subdivided scene, each voxel keeps a record of the x,y 
indices of rays that pass through it. For subsequent frames, 
when objects inside a voxel move, the voxel notifies the 
frame buffer of the pixels that will be affected. Only those 
pixels that are affected are retraced at each frame. 

3.1 The Ray Tracer 

The ray tracer used to develop this algorithm utilizes an 
adaptive voxel subdivision scheme [Jevans 89], although 
any object space subdivision scheme can be adapted to use 
the algorithm. The object space is subdivided by a voxel 
grid. Each voxel maintains a list of pointers to the objects 
that intersect or lie within it. If the number of objects 
inside a voxel is larger than some threshold, the voxel is 
itself subdivided with a voxel grid. A set of heuristics, 
based on the number of objects in a voxel, is used to 
determine the granularity of the subdivision [Jevans 91]. 

Space subdivision is done on the fly when a ray first enters 
a voxel. This lazy evaluation technique ensures that 
computation and memory are not wasted subdividing areas 
of the object space that are not visible. To ensure that 
voxels are only subdivided the first time a ray enters them, 
they are initially marked as not subdivided. When a ray 
enters a voxel, the heuristics are used to subdivide it, and it 
is marked as subdivided. Newly created sub-voxels are 
marked as not subdivided, as they will be considered for 
subdivision only if rays pass through them. Voxels 
marked as subdivided are not considered for subdivision 
when successive rays enter. Note that if the number of 
objects in a voxel is small, no subdivision may occur, but 
it will still be marked as subdivided. 


Figure 2. 

3.2 Rendering the First Frame 

Every pixel in the first frame of the animation sequence is 
ray traced. Rays are labeled with their originating pixel's 
x } y frame buffer index. When a ray passes through a 
voxel, a record of its pixel index is stored with the voxel. 
This information is stored for all voxels, whether they are 
empty or not, and whether they are leaf or interior nodes of 
the subdivision tree. 

Each voxel has a 16 by 16 bit-table to store the Jt,y pixel 
indices of the rays that pass through it. Each bit represents 
a block of pixels that occupy 1/256^ of the screen area. 
This storage method is independent of image resolution, 
and only requires 32 bytes of memory per voxel. Higher 
resolution bit-tables can be utilized if memory usage is not 
a constraint. Higher resolution bit-tables provide finer 
granularity of the rays that will be traced at each frame, 
with little increase in computational overhead. The ideal 
resolution for the bit-tables is the resolution of the image 
frame buffer. 

When a ray enters a voxel, the bit in the voxel’s bit-table 
that corresponds to the ray’s x,y index is set. The voxel’s 
bit-table may represent disjoint areas of the screen. This 
occurs when an object is visible to both primary viewing 
rays and to secondary rays, such as shadow or reflection 
rays, in another pail of the screen (Figure 2). 

3.3 Subsequent Frames 

For each subsequent frame, the object space database and 
the subdivision structure need updating to reflect changes 
that have occurred since the previous frame. The entire 
subdivision tree is traversed, and every voxel is marked as 
unchanged. Objects that change from the previous frame 
are reinserted into the voxel subdivision tree, and the 
voxels that they affect are marked as changed. 
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If an object is deleted from the scene, the voxels in which it 
lay are marked as changed, and any references to the 
object are deleted from these voxels. If an object is added 
to the scene, the voxels in which it now lies are marked as 
changed, and references to it are added to those voxels. If 
an object moved or changed shape or surface attributes, 
both the voxels in which it lay and the voxels to which it 
moved are marked as changed, and references are added and 
deleted as appropriate. When marking a leaf node voxel as 
changed, the voxels above it in the hierarchy are marked 
as touched. 

As long as the number of objects that change is fewer than 
the number of objects that remain static, the time to resort 
the changed objects into the subdivision structure is less 
than to completely rebuild the structure. This speedup is 
not significant, however, as the total subdivision time of 
adaptive subdivision algorithms is typically on the order 
of a few percent of the total rendering time [Jevans 89]. 

3.3.1 Examine the Voxel Space 

The next step is to examine the voxel space to determine 
which pixels need retracing. A 16 by 16 bit-table 
representing the frame buffer is created and every bit is 
initialized to zero. Starting at the top level of the 
subdivision tree and working down, every voxel is 
examined. If a leaf voxel is marked as changed, the frame 
buffer bit-table is or-ed with the voxel's bit table. After the 
entire voxel space has been examined, the bits that are set 
in the frame buffer bit-table indicate which pixel blocks 
must be retraced. 

All records of the rays that are about to be retraced must be 
deleted from the voxel bit-tables in case the retraced rays do 
not pass through those voxels in the next frame. This is 
accomplished by examining the voxel space a second time 
and clearing all bits in the voxel bit-tables that are set in 
the frame buffer bit-table. 

The frame can now be generated. Pixels that correspond to 
the bits that are set in the frame buffer bit-table are retraced. 
All other pixels retain their color values from the previous 
frame. 


3.3.2 Resubdivision 

If the number of objects in a voxel changes significantly 
from one frame to another, it may be advantageous to 
resubdivide it. This can be determined during the 
examination of the voxel space. If a voxel is marked 
touched or changed, and the number of objects inside it 
has changed significantly or gone to zero, its child voxels 
are recursively deleted, and it is marked as changed and 
not subdivided. It will be examined for resubdivision on 
the fly when rays are being retraced. 



□ 

a 

□ 


unchanged voxel. 

changed voxel. 

skipped voxel. 

ray is retraced. 


Figure 3. 

3.3.3 Ray Traversal Optimization 

An optimization to the traversal of viewing rays through 
the voxel grid can be made by storing the distance along 
each ray from its origin to its intersection with the visible 
surface. When a viewing ray is retraced, the voxel traversal 
algorithm can treat any non-changed voxels as empty if 
they are closer to the eye than this distance. Neither ray- 
object intersection calculations nor traversal of sub-grids 
need be performed for voxels that are not marked as 
changed (Figure 3). 

4 Analysis of Animation 

Since this algorithm requires that the camera remain static 
during an animation sequence, it is of interest to know if 
such sequences constitute a significant portion of computer 
animation. Table 1 presents statistics for the duration of 
time that the camera remains static for several well known 
animations. The timings in Table 1 are approximate 
however, because they do not account for cuts in the 
camera's point of view, nor for camera holds, which can be 
rendered as a single frame and replicated during Filming. 

For the films analyzed in Table 1, static camera sequences 
account for a significant portion of the animation. 
Naturally there are degenerate cases, such as fly-by 
sequences, where static camera sequences are few or 
nonexistent. However, for animations that include static 
camera sequences, a method that accelerates the ray tracing 
of such sequences can have a significant impact on the 
overall rendering time. 
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Animation 

Running time 

Static camera time 

% static camera 

Luxo Jr 

131 sec 

131 sec 

100% 

Red’s Dream 

320 sec 

259 sec 

81% 

Tin Toy 

451 sec 

418 sec 

93% 

Pencil Test 

253 sec 

231 sec 

91% 

The Audition 

309 sec 

240 sec 

78% 


Luxo Jr, Red's Dream, Tin Toy © 1986, 1987, 1988 PIXAR. 
Pencil Test, The Audition © 1988, 1990 Apple Computer, Inc. 


Table 1. 


5 Results 

A sequence from the Apple Computer, Inc. animation "The 
Audition”, shown at SIGGRAPH '90, was used to test the 
object space coherence algorithm. In this sequence a 
weight is dropped onto the see-saw, launching Eric the 
worm into the air. The motions of Eric, the see-saw, and 
the weight are derived from a dynamic simulation. 

The sequence is 351 frames in length. The scene consists 
of 6000 polygons and 4 light sources. All frames were 
rendered at 640 by 480 resolution, with one ray per pixel, 
on a Silicon Graphics Personal Iris 4D/25 workstation. 

The sequence was rendered one frame at a time with a ray 
tracer that utilizes an adaptive voxel subdivision technique. 
The number of pixels traced and the CPU time required to 
render the sequence are listed in Table 2 under the heading 
Traditional Algorithm. The sequence was then 
rerendered with the identical ray tracer, modified to use the 
object space coherence algorithm described in this paper. 
The CPU time, number of rays, and the ratios of these 
numbers compared to the traditional frame by frame 
approach are listed in Table 2 under the heading 
Coherence Algorithm. 


Figure 4 shows several frames of the animation sequence 
illustrating only the pixels that were retraced by the 
coherence algorithm. Note that all the pixels of frame 0 are 
rendered by both the traditional and coherence algorithms. 

5.1 Discussion 

Examining the Entire Sequence row in Table 2 
illustrates that over the course of the animation sequence 
the coherence algorithm rendered only 19.35% of the rays 
that were traced by the traditional algorithm, and required 
only 26.72% of the CPU time used by the traditional 
algorithm, yielding a speedup of nearly a factor of four. 

The discrepancy between the percentage of rays traced 
(19.35%/) and percentage of CPU time (26.72%) required to 
render the sequence with the coherence algorithm is due to 
two factors. First is the overhead incurred by the coherence 
algorithm in building and maintaining bit-tables in each 
voxel and of collecting them at the beginning of each 
frame to determine the pixels that must be retraced. This 
overhead is apparent in Table 2 in the row that gives the 
statistics on the rendering of frame 0. The unmodified ray 
tracer requires 666 CPU seconds to render the frame whereas 
the frame coherence algorithm increases the rendering time 



Traditional Algorithm 

Coherence Algorithm | 

Frame # 

# Rays 

CPU Time 

# Rays 

Ratio to 

Traditional 

CPU Time 

Ratio to 
Traditional 

0 

307,200 

666 sec. 

307,200 

1.0 

780 sec. 

1.171 

1 

307,200 

669 sec. 



418 sec. 

0.6248 

75 

307,200 

684 sec. 

146,400 

0.4765 

528 sec. 

0.7719 

150 

307,200 

649 sec. 

58,800 

0.1914 

209 sec. 

0.3220 

200 

307,200 

634 sec. 

9,600 

0.0312 

44 sec. 

0.0694 

350 

307,200 

655 sec. 

8,400 

0.0273 

43 sec. 

0.0656 

Entire 

Sequence 

107,827,200 

63.32 hrs. 

20,866,800 

0.1935 

16.92 hrs. 

0.2672 


Table 2 
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Frame 1 Frame 15 Frame 35 



Frame 40 


Frame 75 


Frame 150 



Frame 200 Frame 350 

Figure 4 
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to 780 CPU seconds, a ratio of 1.17. This overhead is more 
than offset by the savings in subsequent frames. 

The second source of discrepancy is due to the fact that rays 
are not uniform in their rendering cost. In this animation 
sequence, a complex object, the worm, is being retraced at 
each frame. The area around this complex object is more 
densely subdivided than the rest of the scene, requiring 
more traversal time per ray. The worm also has a more 
complex illumination model than the background model. 
The rays that are not retraced at each frame are typically 
those that intersect the background of the scene. These 
rays travel largely through empty voxels and intersect 
more simple objects, such as the tent model in this 
animation sequence. 

6 Future Work 

6.1 Inactive Voxel Collection 

Adaptive spatial subdivision algorithms can reduce the 
amount of memory they require by taking advantage of ray 
coherence. When rendering an image, parts of the 
subdivision structure can be deleted if rays are no longer 
passing through them. This is common when rendering 
scanlines from top to bottom, as rays originating from 
scanlines near the bottom of the screen rarely pass through 
the same voxels as rays from higher scanlines. Voxels that 
are no longer active can be identified periodically during 
the rendering, and can be collected. This entails deleting 
the voxel's grid structure, and marking the voxel as not 
subdivided . If the assumption proves incorrect, and a ray 
passes through the voxel at a later time, the voxel will be 
resubdivided. 

This idea can be extended to the temporal coherence 
algorithm by collecting areas of space that remain 
unchanged and untraversed for a number of frames. If an 
object inside a collected voxel changes, or a ray traverses 
the voxel, then it will be resubdivided. 

6.2 Light Sources 

When animating a light source, all rays that pass through 
the voxels in which it lies must be retraced. Since most ray 
tracers treat light sources as invisible if viewed directly, it 
is desirable to avoid retracing viewing rays that pass 
through a voxel in which a light source has moved (Figure 
5). If a separate pixel index bit table for shadow rays is 
maintained in each voxel, then only ray trees that are 
affected by a moved light source are retraced. 



viewing ray is erroneously retraced, 
ray is retraced. 



light source that has been moved. 


Figure 5. 


6.3 Moving Camera 


The algorithm presented in this paper may be extensible to 
sequences where the camera is moving, through the use of 
the reprojection technique proposed by Badt [Badt 88]. The 
3D intersection points of the directly visible surfaces are 
projected onto the screen when the camera moves. If the 
camera moves only slightly, then the samples will not 
change in density, and a new image can be reconstructed 
from them. If the density of the samples changes, then the 
pixels will have to be retraced to avoid erroneous hidden 
surface results. 


6.4 Backwards Ray Tracing 

The object space coherence algorithm is useful, even with a 
moving camera, to accelerate backwards ray tracing 
techniques. Heckbcrt uses bidirectional ray tracing to 
calculate global illumination [Heckbert 90], and Watt uses 
backwards beam tracing to calculate light-water interaction 
[Watt 90]. Both methods could reduce the number of view 
independent rays required to render an animation sequence 
with non-moving light sources. 

7 Conclusion 

An algorithm has been presented for making use of object 
space temporal coherence when ray tracing animation 
sequences where the camera remains static. Only rays that 
pass through voxels in which objects have changed are 
traced at each frame. Memory use is independent of image 
resolution, and the algorithm is easily adapted to any 
spatial subdivision scheme. 
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1. Abstract 

Ray tracing consumes a lot of computational resources 
to render images. This expense usually lies in the ray-surface 
intersection tests. If the surfaces were polygonal, then we 
should be able to apply more polygon-specific optimizations 
to partially cull intersections. Our ray tracer uses a non¬ 
memory intensive, voxel traversal intersection culler to assist 
in such optimizations. 

Keywords: intersection culling, polygon, ray tracing, subdivi¬ 
sion, voxel traversal. 

2. Introduction 

Ray tracing [Appe68] [Gold71] is widely ack¬ 
nowledged as a rendering approach that can produce very 
realistic and beautiful images [Whit80], It is also widely 
known that ray tracing is very expensive computationally. 
Many intersection culling algorithms have been proposed to 
reduce this expense. However, such intersection culling algo¬ 
rithms do not take into consideration the nature of the primi¬ 
tives which they are culling. 

The polygon is one of the most used primitives in 
rendering surfaces - either as a result of tessellation of com¬ 
plex surfaces, or as descriptions of truncated planar surfaces. 
In this paper, we examine the polygon veiy carefully, in hopes 
of optimizing and reducing the need for ray-polygon intersec¬ 
tions, while keeping the memory requirements down to a 
minimum. The intersection culling algorithm used to assist in 
such optimizations is uniform voxel traversal [Fuji86]. Our 
traversal implementation is taken from [Aman87] [Clca88] 
with ray bounding box checks [Snyd87], 

3. Voxel Traveral and Ray Bounding Boxes 

A popular intersection culling algorithm is voxel 
traversal [Glas84] [Fuji86]. Space encompassing all polygons 
in the scene is divided up into small 3-dimensional boxes, 
commonly known as voxels. Each voxel contains a pointer to 
polygons that reside in the space occupied by that voxel. 
Each ray generated traverses the voxel structure in-order and 


tests for intersection only with polygons residing in voxels 
that the ray pierces. Thus we hope a small candidate subset of 
polygons needs to be tested for intersection. 

To further reduce the number of polygons that needs to 
be tested for intersection, Snyder and Barr [Snyd87] proposed 
the ray bounding box. A ray bounding box in a voxel is 
created from the ray segment that resides inside that voxel, 
bounded by the tMin and /Max extents/distances through the 
voxel. For each polygon in the voxel, if its bounding box 
does not cross the ray bounding box, then no intersection test 
with that polygon is necessary; see figure 1. If they do cross, 
then the ray-polygon intersection test is needed. This box- 
crossing test requires at most 6 floating point comparisons. 



This ray bounding box optimization has proven to 
accelerate the ray tracing culling process by a great deal, 
especially for densely populated regions distributed in a non- 
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uniform spatial manner, in which the raw uniform voxel 
traversal scheme does quite miserably. See table 1 bench¬ 
marks for this evidence. However, by using ray bounding 
boxes, we are restricted to floating point voxel traversal 
schemes [Glas84] [Aman87] [Snyd87] because the tMin and 
tMax values need to be computed, and thus integer-only ver¬ 
sions [Fuji86] [Clea88] cannot be used (their increase in com¬ 
putational speed is neglible compared to the advantages of the 
ray bounding box anyway). 

4. Usual Ray-Polygon Intersection Process 

The usual ray-polygon intersection test involves the 
following steps: (1) intersection against the plane on which 
the polygon lies to compute the hit distance t; (2) check that / 
is in front of the ray origin (/ > 0) and t is not in front of any 
already intersected hits (f < tHit ) - if either is false, then do 
not proceed any further as we have already decided that this 
polygon cannot be the closest visible polygon; (3) use the t 
value to compute the intersection point; (4) check that the 
intersection point lies inside the polygon: this is known as the 
inside-outside check. 

Of ail the above steps, the inside-outside check (4) is 
usually the most expensive. So we try to avoid this step as 
much as possible. One previous attempt to avoid (4) was 
illustrated in [Woo90], where after step (3), the intersection 
point is checked against the bounding box of the polygon. If 
the intersection point lies outside the box, then this polygon 
cannot possibly be hit by the ray - this check requires 6 float¬ 
ing point comparisons. Furthermore, there is no need to com¬ 
pute all the x f y,z intersection points before checking with the 
bounding box. Computing the x intersection point followed by 
checking with the x extents of the bounding box, then repeat 
with the y and z extents, will be all that much more efficient. 
This optimization appears to be very effective for tessellated 
polygons. 

5. Order of Candidates for Intersection 

In step (2), wc also make sure that the t value of the 
current polygon is not in front of any intersection hits tHit that 
have already taken place, i.e., t < tHit. If t > tHit , then even 
if this ray does intersect the polygon, it will not be the closest 
visible polygon. So why bother with steps (3) and (4)? This 
leads us to think that it is advantageous to have the closest 
visible polygon tested for intersection near the beginning and 
all other candidate polygons can be trivially dismissed from 
the t < tHit check (as well as the advantage to be described in 
section 6). 

5.1. Dynamic Updating of the Database 

For each voxel, there exists a linked list of candidate 
polygons that occupy the voxel. Ray-polygon intersection 
tests occur in-order through the linked list. For our optimiza¬ 
tion, when a ray intersects the closest visible polygon inside 
the voxel, that polygon is shifted up to the beginning of the 
linked list. Future rays that pierce the voxel may have the 
same visible polygon but now have the advantage of intersect¬ 


ing the visible polygon first (or close to first). Then many 
other ray-polygon intersection tests are rejected at step (2). 

This optimization should theoretically be quite effec¬ 
tive for shadow determination. It is common practice to 
assume that what the previous shadow ray hits may be true for 
the current shadow ray [Hain86]. In addition, shadow rays 
only need to determine if there exists an intersection hit or 
not. Thus updating the voxel linked lists might lead to an ear¬ 
lier intersection hit for future neighbouring shadow rays. In 
addition, this is better than just keeping one polygon pointer 
to what was previously intersected [Hain86], since neighbour¬ 
ing shadow rays might intersect polygons A, then B , then A , 
then B , etc. in that order. With our optimization, we should 
detect intersection within the first few tests of that voxel. In 
addition, this optimization would be a nice complement to 
another shadow culler [Pear91] in which triangles residing in 
the last shadow ray hit voxel are intersected first. 

However, we found that the linked list updates for sha¬ 
dow rays do not perform that well. Perhaps the [Hain86] 
[Pear91] optimizations have already done quite a lot of the 
work in our implementation already. And having many 
lights, such linked list updates may prove to be quite useless. 
Thus, this linked list update is only done for non-shadow rays. 

5.2. Almost Hit Cases 

In the previous subsection, we only shifted the hit 
cases up to the front of the voxel’s linked list so that future 
rays will hit the visible polygon near the beginning. How¬ 
ever, chances for the same visible polygon are not as likely 
due to the small tessellated polygons. Thus, we should shift 
pointers up to the front of the linked list for nearly hit 
polygons as well. Then we will be able to get even better 
results on reaching future visible polygons faster. 

An almost hit case will be one that returns a no hit 
intersection while in step (4) of the ray-polygon process 
(inside-outside check), and after the [Woo90] optimization 
check. This optimization check is quite reliable because it 
eliminates many candidate polygons, except ones that the ray 
is close to. 

5.3. Using the RaylD Effectively 

In the previous subsections, we needed to do some 
linked list shifting. The raylD [Aman87] (first proposed to 
eliminate multiple intersections with the same object in a 
voxel traversal environment) can be used to select better can¬ 
didates for intersection without such shifting. For a linked list 
of candidate polygons to intersect, a 2-pass walk through of 
the list is needed. The first time the linked list is traversed, 
only the polygons whose raylD is quite close to the current 
ray’s raylD are intersected. Then the second time Ihc list is 
traversed, the remainder of the candidate polygons arc inter¬ 
sected. 

We do this optimization because the closer the polygon 
raylD is to the current ray’s raylD , the indication that previ¬ 
ous rays have attempted to intersect with this polygon. Thus 
it is more probable that this polygon may be hit by the current 
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ray. A polygon’s ray ID that is very different from the ray’s 
raylD indicates that the polygon has not been intersected 
lately by previous rays - thus it is likely that the polygon will 
not be the visible polygon. 

On trying out this optimization, it appeared that the 
raylD is not really a good indicator of better candidates, espe¬ 
cially when in section 6, dynamic ray boxes are used. Thus, 
this optimization was removed from our implementation. 

6. Dynamic Ray Bounding Box 

For each voxel, Snyder and Barr [Snyd87] suggested a 
box-crossing test between the ray bounding box and each 
polygon’s bounding box. If they do not cross, then we know 
that intersection with the polygon must fail without any actual 
ray-polygon intersection tests. We can do a little better: once 
we get any intersection iHit (not just the closest one) with the 
ray, we can also dynamically reduce the size of the ray bound¬ 
ing box. In other words, the ray bounding box is bounded by 
[tMin, min {tHitjMax)], instead of [iMin, tMax] With the 
section 5 optimizations, we hope that the closest visible 
polygon with hit tHit will be encountered near the beginning 
of the voxel’s linked list. As a result of this, the ray bounding 
box can be adjusted earlier and more ray-polygon intersection 
tests can be avoided from the box-crossing test. 

Figure 2 



not cross other 
polygon boxes 


t If iHit > tMax > then the hit intersection point resides in a subsequent 
voxel. That voxel, when reached, can be bounded by [ tMin y tHit ], 


This new ray bounding box does not require additional 
computation, since it is already done in the ray-polygon inter¬ 
section hit. In addition, note that this optimzation does not 
delay intersection, but is a guaranteed culling step; sec next 
subsection’s pseudo code. It can be interpreted as a form of 
object coherence, where we base our assumptions of visibility 
on a previous neighbouring ray. For example, in figure 2, if 
we intersect the visible polygon PI first as a result of the sec¬ 
tion 5 optimizations, then all the other polygons P2, P3, P4, 
(which may belong to the same convex surface) residing in 
the same voxel can be trivially rejected for intersection due to 
the newly adjusted ray box. If the ray bounding box is not 
adjusted, then intersections with all those polygons would be 
needed. 

Note that the starting ray bounding box for a voxel 
docs not need to be bounded by [/Mm, tMax] either. For 
each occupied voxel traversed, we added a ray-box intersec¬ 
tion check with the bounding box containing all the polygons 
(bounded by the voxel) inside the current voxel - refer to this 
bounding box as B. This is done in hopes that the polygons 
only occupy a small region of the space inside the voxel, and 
that the ray might miss all of them - we only do this intersec¬ 
tion check if B is smaller than the voxel itself. If the ray docs 
not intersect 5, then we just continue traversal. However, if 
the ray does intersect B , we have to perform box crossing and 
possibly intersection tests with those polygons inside the 
voxel. Then if the ray intersects B at distances we can 

start off with a smaller ray bounding box bounded by [/,, / 2 ], 
where iMin <l j < t 2 ^ tMax . Note, in figure 3, with this new 
ray box formed by t x and / 2 , we do not have to intersect 
against the bottom sphere. 

In the case of second generation rays, it is usually that 
1 1 , tMin < 0. Then we bound the ray bounding box by 
[0, f 2 ]. This usually avoids the intersected surface, if convex, 
to test for reflection intersections against itself. Self-mirror 
reflections cannot occur for convex surfaces. 



Graphics Interface ’92 












187 


6.1. Some Pseudo Code 

/* Standard ray bounding box scheme for each voxel */ 

rayBox.boundl = 0 + voxel->t:Min * D; 
rayBox.bound2 = O + voxel->tMax * D; 
for (each polygon in voxel) 

{ 

if (boxesCross (rayBox, polygonBox)) 
hit = intersectPoly (&tHit, 

kintersectPoint, polygon); 

} 


/* New dynamic ray bounding box scheme */ 

rayBox.boundl = 0 + max[0,tl] * D; 
rayBox.bound2 = O + t2 * D; 
for (each polygon in voxel) 

{ 

if (boxesCross (rayBox, polygonBox)) 
hit = intersectPoly (&tHit, 

&intersectPoint, polygon); 

/* section 5 optimization */ 

if (hit |I almostHit) 

place polygon at the front of 
the voxel list; 

/* section 6 optimization */ 

if (hit &Sc tHit < t2) 

rayBox.bound2 = intersectPoint; 

) 

6.2. Avoiding Ray Bounding Box Evaluations 

In the above pseudo code, the boxesCross routine takes 
at most 6 floating point comparisons. However, this is needed 
for each polygon in a voxel. In some instances, these ray 
bounding box evaluations are known ahead of time to not help 
speed up the ray tracing process, then we might as well do the 
ray-polygon intersection immediately (without any box- 
esCross tests). This is when the ray crosses a voxel that 
results in a ray bounding box close to the size of the voxel. 
We can detect this situation by evaluating 
t 2 -t i > tMaxVoxel for each voxel. If true, then we do not 
bother with t he ray bou nding box checks, where 
tMaxVoxel = P Vx 2 + Y 2 + Z 2 (which is a constant), X,F,Z 
represent the dimensions of each voxel, and P is some value 
close to and less than 1 (a good choice for p is 0.75), Note 
that if p is 1, then tMaxVoxel is just the maximum ray seg¬ 
ment that can pass through the voxel. 

Then we avoid the box crossing test for this voxel 
unless the ray bounding box can be reduced by the section 5 
and 6 optimizations. 

63. Multiple Ray Bounding Boxes 

If there are a large number of polygons in the voxel, an 
obvious optimization that can be done here is to have multiple 
ray bounding boxes for this voxel. Then we have a step-case 


of smaller ray bounding boxes. If there are r ray bounding 
boxes, then the voxel’s linked list of polygons need to be 
traversed r times, each time their bounding boxes checked 
against the current ray bounding box. 

This was implemented and tested on our raytracer, but 
found the results to be quite disappointing and thus was not 
included in the final raytracer code. It appeared that with 
multiple ray bounding boxes, less ray-polygon intersections 
took place. However, many ray-polygon intersections were 
just delayed by the many box crossing tests. Then the box 
crossing tests dominated the processing time. 

7. The Ray-Plane Intersection 

The ray-plane intersection (step 1) needs to be optim¬ 
ized to improve the overall ray-polygon intersection test. The 
usual computation of t with a plane (or polygon) is: 

d-NO 

ND 

where the plane equation is defined by N'P -d % P is a 
<r,y,Z> variable triplet, N is the surface normal, and the ray 
(thus the ray-plane intersection point in step 3) is defined by 
O + tD. We will show that the t evaluations should take 6-8 
floating point operations under some circumstances. 

7.1. For Second Generation Rays 

With second generation rays being shot, we can reuse 
some previous results to save computation. In other words, 
for a reflection ray, some of the ray-plane computation can be 
reused from its parent/cast ray. 

We will subscript all cast ray information with c and 
reflected ray information (extendible to refraction and shadow 
rays as well) with r. We can expand d-NO r to 
d - N-(O c + t c D e \ where t c is the t value for the closest visi¬ 
ble polygon in the cast ray. Then we get d -NO c -t c ND C . 
As a result, only an additional multiplication and subtraction 
are needed to compute the numerator of t r in step (1): we 
already know the values of f c , d x =d- NO c and d 2 =N'D c 
from the cast ray. The reflection ray numerator is then simply 

^1 “ l c ^2- 

After some implementation and testing, we found that 
this optimization is not worth the trouble due to the small 
number of same cast and reflection ray hits, and due to the 
complicated information management within our scheme. 

7.2. For Cast (First Generation) Rays 

For first generation or cast rays, we notice that the 
numerator of the t evaluation is always a constant for the 
same triangle (assuming a perspective view). In other words, 
d - N O c is the same througout each triangle. However, it is 
far too memory consuming to store the numerator for each tri¬ 
angle in order to save dot product evaluations. 

We can preprocess and translate the entire database so 
that the eye origin resides at (0,0,0). Then N O c = 0 and the 
numerator of the / evaluation is d for all triangles, without the 
need for any storage. Thus t = d /N'D C . 
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There are other advantages to this translation as well. 
The ray-plane intersection point computation is simplified 
from O c + tD c to just tD c . We can use this tD c to simplify ray 
bounding box computations for tMin*D c and tMax*D c as 
well. However, we cannot apply this optimization when 
depth of field effects (along the flavour of distribution ray 
tracing [Cook84]) are to be generated. 

73. Dynamic Clipping of t Values 

In step (2) when the / value is computed, we need to 
check that t is in front of the ray origin. In most implementa¬ 
tions, a constant fudge value is used to evaluate this: e.g., 
t > 0.0001. However, we can use dynamic clipping of the t 
extents. We should check that t > tMin instead - actually, it is 
even better to check that t > t x (since t \ > tMin). 

This optimization allows us to omit steps (3) and (4) if 
i < t ] . This is because the intersection point is far beyond the 
polygon (beyond the voxel, in fact) that we know it cannot 
possibly be an intersection hit. With more tight values like fj 
to clip against, the better the chance that steps (3) and (4) can 
be omitted. 

8. Implications of Voxel Subdivision 

One open and difficult question in uniform voxel 
traversal approaches is the subdivision level necessary to get 
an optimal overall ray tracing performance. If the subdivision 
level is too deep, then we pay for more traversal and extensive 
memory costs, but gain the advantage of only needing to deal 
with small number of surfaces in each voxel. If the subdivi¬ 
sion level is not deep enough, we pay for the cost of having to 
perform more ray-surface intersection tests - this can 
significantly slow down the ray tracing process. Devillers 
[Devi88] attempted to answer this question with an analytic 
solution R for uniformly sized voxels. Subramanian and 
Fussell [Subr91] also attempted to answer this question in a 
similar fashion. However, both papers always assumed a 
RxRxR subdivision scheme, instead of a general XxYxZ 
scheme. In addition, there are so many variables (though spa¬ 
tial distribution and number of polygons play a major role in 
this analysis) which we must consider that it cannot all be 
encapsulated in an analytic equation. And is it worth the 
effort to compute this complicated and expensive solution 
anyway? 

8.1. Previous Benchmarks 

With ray bounding boxes (Snyd87], it seems that we 
are less reliant on the voxel subdivision as compared to the 
raw voxel traversal approach; the ray bounding boxes act as 
second level cullers in case many surfaces need to be tested 
for intersection. And since voxels do take up a lot of 
memory, it seems that we should consider XxYxZ subdivision 
schemes that are small. The justification for minimal subdivi¬ 
sion can be seen in table l, where ray bounding boxes 
(without our optimizations discussed here) were used to 
accelerate ray tracing on a SUN 3/280 with fpa, and 8 mega¬ 
bytes of memory. The benchmarks were done on a University 


of Toronto ray tracing program named optik with true spheres 
(not tessellated into many polygons), benchmarked in 1988. 
The image used was taken from the Haines* sphere flakes 
image [Hain87], where a densely populated environment dis¬ 
tributed in a non-uniform manner is created due to a large 
floor and a small, concentrated set of spheres. Note that Ray 
Box indicates the CPU minutes taken to ray trace with the ray 
bounding box and uniform voxel traversal; Raw Traversal 
indicates the CPU minutes taken to ray trace with only uni¬ 
form voxel traversal. 


^Sphere 

Grid Res 

Image Res 

Ray Box 

Raw TraversaJ 

7382 

40x40x40 

512x512 

163 

391 

7382 

50x50x50 

512x512 

142 

270 

7382 

60x60x60 

512x512 

138 

220 

7382 

70*70x70 

512x512 

133 

199 


Table 1: Ray Box vs. Voxel Traversal 

8.2. An Approximate Subdivision Level 

Another reason we sought a memory conservative 
voxel subdivision is due to our main platform - the Mac II. 
Memory conservation is so essential, we can only assume a 
maximum configuration of 8 megabytes of memory on the 
machines. This is why many of our optimizations do not take 
up additional memory and help reduce the usage of memory 
as well: an alternative to faster but more memory intensive, 
voxel-based cullers [Jeva89], 

We do not try to look for an optimal subdivision level; 
an approximate one keeping the subdivision minimal should 
do. Our subdivision scheme to be described below works 
well in general for polygons. We consider only one main 
variable: the total number of polygons in the scene - label this 
n. Let assuming a rather uniform distribution of 

polygons throughout 3-space. Then we compute the bound¬ 
ing box surrounding all polygons in the scene. Let the spans 
of the bounding box extents (difference between the max¬ 
imum and minimum extents) be labelled s x , s y and s z for each 
of the axes. Then, taking into consideration the spans that 
occupy the voxel space, 

s x s v s z 

X =-- m, Y = -- m, Z =-- m. 

ntaxS maxS maxS 

where maxS = max(s xt s yt s t ). This linearity provides us with 
more cubical voxels than the standard RxRxR subdivision. 
More cubical voxels give a good distribution of polygons 
within voxels without using up the excess memory imposed 
by a RxRxR subdivision scheme. 

8.3. Order Complexities of the Subdivision Level 

With this voxel subdivision strategy, we have placed 
an upper bound on the memory consumption. At worst, the 
number of voxels is tnxmxm, which actually equals n. And at 
worst, each voxel will contain pointers to n polygons. Thus 
the upper bound memory usage is 0(n 2 ) (actually, we can 
lower this upper bound to 0(n 5/3 ) for planar polygons if a 
smart insertion into voxels is done). However, having each 
voxel containing n polygons is very unrealistic: this means 
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that all the modelled polygons occupy a large chunk of 3d 
space. The best case memory usage is O(n), indicating that a 
polygon resides totally within a fixed number of voxels, 
which may very well be the norm for tessellated polygons. 

Since the complexity of the ray bounding boxes is 
clearly 0(n) t then the above paragraph’s complexities also 
hold true for the entire ray tracing intersection culler memory 
requirements. 

9. Testing and Analysis 

The testing was done on a standard 68030, 40 MHz, 
Mac Ufa running under the MPW3.2 environment on Multi- 
Finder 6.0.5 with 8 megabytes of memory. The C code was 
compiled under the standard MPW C compiler. In our imple¬ 
mentation, all surfaces are tessellated into triangles and 
efficiently ray traced in barycentric coordinates. Our ray- 
tracer is a ported and re-coded version of Alias’ raytracer pro¬ 
duct, version 3.0. This ported version serves as the raytracer 
for the Sketch! product on the Mac. 

Many previous papers on intersection culling algo¬ 
rithms appeared only interested in the number of ray-surface 
intersections that were done. Their only aim was to lower the 
number of such intersections. However, the computation to 
avoid such intersections may become even more costly than 
the actual intersection itself. Intersection culling research is at 
the point now where this number alone has become a less 
important indicator for speeding up the ray tracing process. 
Thus, in the upcoming testing sections, the number of inter¬ 
sections done are not stressed in our benchmarks. 

A worst-case breakdown of our ray-triangle intersec¬ 
tion scheme is as follows: step (l) requires 13 flops, step (2) 
requires 2 flops, step (3) requires 6 flops, the extra bounding 
box check [Woo90] requires 6 flops, and step (4) done in 
barycentric coordinates requires 25 flops. This comes to a 
total of 52 floating point evaluations per triangle. With the 
section 7 optimizations, the floating point count for cast ray- 
intersections is lowered to 43. 

9.1. General Testing 

The benchmarks in table 2 are listed in total CPU 
minutes and seconds to render the 640x480 images, and 
exactly 1 sample per pixel is taken. On average, it appears 
that we get about 9-14% improvement with our optimizations 
over the old timings (New Time over Old Time\ where the 
old timings represent the basic Snyder and Barr culler imple¬ 
mentation [Snyd87] and using the subdivision level calculated 
via section 8’s method. Considering how much superior ray 
bounding boxes are over uniform voxel traversal (as can be 
seen in table 1), the 9-14% improvement is not too bad. Also 
note that %Intersect is the percentage of ray-polygon intersec¬ 
tions saved with our optimizations. 

Everything in the scene is made mirror reflective, with 
a maximum recursive reflection depth of 3. Note also that the 
lamp image is non-uniformly/sparsely distributed due to the 
large floor on which the lamp lies on. The lamp image is 
densely populated due to the small nuts and bolts, as well as 


the duplicated lamp bowl on the inside and outside. 


Image 

rrn 

Grid Res 

^Intersect 

Old Time 

New Time 

Spheres 

3500 

15x9x5 

13.6% 

7:55 

6:43 

Room 

5182 

17x17x16 

10.2% 

37:20 

34:01 

Lamp 

29062 

30x10x30 

14.9% 

25:44 

23:27 


Table 2: General Optimization Benchmarks 


9.2. Object Coherence Testing 

We suspect that with a higher sampling rate will come 
better improvement results for our optimizations. This is 
mainly due to the assumption that object coherence optimiza¬ 
tions in sections 5 and 6 are more likely to get similar hits 
between subsequent rays (which will provide the majority of 
the speedups in this paper). The lamp image, at resolution 
640x480, is being used to test out this assumption (see table 
3), where a 6.7% (8.9-2.2) improvement jumps to 11.3% 
(12.2-0.9) with more sampling. Note that sampling indi¬ 
cates the maximum sampling rate in an adaptive sampling 
scheme [Whit80], Tri Time represents the timings for the 
optimizations mentioned in section 7, and New Time 
represents the timings for all our optimizations. 


Sampling 

Old Time 

Tri Time 

%Improve 

New Time 

%Improve 

lxl 

25:44 

25:10 

2.2% 

23:27 

8.9% 

2x2 

37:32 

37:11 

0.9% 

34:07 

9.5% 

3x3 

61:28 

60:55 

0.9% 

54:02 

12.2% 


Table 3: Lamp Image with Levels of Anti-Aliasing 


9.3. Voxel Subdivision Testing 

Our voxel subdivision scheme proposed in section 8 
needs to be verified, for it is difficult to accept that such 
minimal subdivision is sufficient in many cases. Most pro¬ 
grammers implementing a uniform voxel traversal scheme 
usually employ a much deeper subdivision level. Table 4 
illustrates an example for the spheres image. Note that space 
usage indicates the total memory usage by the program, and 
note the alarming increase in memory as subdivision level 
increases for even such a simple scene. 


Grid Res 

Time 

Space Usage 

15x9x5 

6:43 

1,003,852 bytes 

15x15x15 

6:44 

1.221,752 bytes 

20x20x20 

6:50 

1,331,768 bytes 

30x30x30 

7:12 

1,801,816 bytes 


Table 4: Sphere Image with different Subdivision 

In addition, note the comparatively small speed differ¬ 
ence in table 1 between the different subdivision levels. With 
our optimizations and having only to ray trace polygons, we 
expect that the difference will even be narrowed more. 

10. Optimization Extensions 

The list of optimizations mentioned in this paper can 
be trivially extended to other surface types as well as culling 
approaches. For example, the main surface type we con¬ 
sidered here is the polygon. However, the use of object 
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coherence with respect to the ray bounding box (sections 5 
and 6) can be applied to other surface types such as general 
quadrics, parametric and implicit surfaces, etc. In fact, the 
speedups should be even superior due to the more expensive 
ray-surface intersection routines for the complex surfaces and 
the availability of more object coherence (as compared to tiny 
polygons). 

The object coherent ray bounding boxes can be applied 
to most voxel-based cullers [Glas84] [Aman87] [Synd87] 
[Jeva89]. Even with hierarchical voxel structures [Glas84], 
we can apply these optimizations and reduce the voxel subdi¬ 
vision or in case the maximum depth of the voxel is reached 
but the voxel is still over-populated. Furthermore, if voxels 
are created on the fly [Jeva89] as needed, as opposed to all 
preprocessed voxels, then a much higher polygon count 
should be the limit used before further voxel subdivision is 
done. 

11. Conclusions and Further Discussions 

We have listed some simple but rewarding optimiza¬ 
tions that can be easily achieved for ray tracing polygons. 
They have the advantage of requiring neither additional 
memory nor substantial additional floating point computation. 
Some of the optimizations can be applied to other surface 
types and intersection cullers as well. 

Another optimization idea is to generate cartoon 
reflections for planar polygons. Such reflections place a 
decay/fading factor on its intensity, where reflections have no 
visible effect after a certain maximum distance maxDist. A 
simple decay factor can be [(maxDist - tHit) / maxDist] k \ 
where tHit < maxDist. Voxel traversal and ray-polygon inter¬ 
sections will be halted beyond maxDist , but approximate 
reflection information will usually already have been gen¬ 
erated. Mow acceptable and useful is this approximation? 
Could this decay apply to shadows to fake ambience (shadow 
intensity as a function of the distance to the closest occluding 
object) as well? See cartoon reflections image, where the 
floor has parameters maxDist = 5 for reflections, max¬ 
Dist = 10 for shadows, and k = 1, i.e. linear decay. 

Final thought: if our subdivision level is really quite 
small and most polygons fit in few voxels, then do we really 
need the ray ID concept in this environment? Based on other 
experiences [Sung91], the raylD may not always accelerate 
the ray tracing process, probably due to extensive memory 
usage - a 32 bit flag is attached to each polygon. A possible 
alternative is to make the raylD an unsigned short (16 bits), 
and reinitialize all rayID's after 65535 rays are shot. This 
was implemented ontop of our raytracer but found no notice¬ 
able speedups. Other alternatives are needed... 
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Abstract 

Height field ray tracing is one approach to photo¬ 
realistic visualization of terrain. Parametric methods , 
which fall into a new class of height field ray trace 
methods, are introduced*. The height field is 
horizontally sliced into evenly spaced cross-sections. 
For each cross-section, the Euclidean distance from each 
point to the nearest point where the slice cuts through 
terrain is computed off-line. These planes of distance 
values are condensed into parameter planes that encode 
cones of empty space above each height field cell, where 
cone width is bounded by the terrain relief. Parametric ray 
tracing occurs along intersections between rays and these 
cones. Parametric methods are shown to be memory 
efficient and often much faster than the other popular 
height field ray trace methods. 

Keywords: Height Field, Ray Tracing, Distance 
Transform, Parameter Planes, HDDT 
Profile, Incremental Methods, Hierarchical 
Methods, Parametric Methods 

L Introduction 

The probability of mission success associated with 
use of a mission planning and rehearsal system and the 
effectiveness of flight simulator training can be 
dramatically increased by introducing photo-realism. The 
most popular approach to photo-realistic visualization of 
terrain is probably the photo-textured polygon approach. 
This approach represents terrain as a continuous surface 
of planar polygonal facets. Warp coefficients that map 
pixels in simulated views to the source image grid are 
computed for each polygon so that warping can be used to 

1 Research sponsored by Loral Western Development 
Labs. US and international patent applied for on 
parametric ray trace process. 


generate simulated views in real-time. Occlusion is 
handled by some form of hidden surface removal. 

Height field ray tracing is another approach to 
photo-realistic visualization of terrain. There are several 
important distinctions between the photo-textured 
polygon and height field ray tracing approaches. The ray 
tracing approach represents terrain with a simple raster 
data structure of height samples rather than with a vector 
data structure of connected polygon vertices. This allows 
the terrain to be modeled as a continuous surface of 
connected patches that can be curved to enhance realism. 
The ray tracing approach can be parallelized on a pixel- 
by-pixel basis because it does not use hidden surface 
removal, which is inherently sequential, to handle 
occlusion. However, height field ray tracing is less 
popular because it is widely believed to be too 
computationally intensive. Methods for accelerating 
height field ray tracing are reviewed below and an 
efficient new method for height field ray tracing is 
introduced in section 2. A source image and photo¬ 
realistic perspective view generated from it by height 
field ray tracing are shown in Fig.l. 

LI Height Field Ray Tracing 

The database used for height field ray tracing 
contains source images, height fields and perhaps data 
derived from height fields, where height fields are arrays 
of evenly spaced height samples. During height field ray 
tracing, portions of the source image may be swapped 
into RAM from disk but for any local area of interest, the 
height field and any data derived from it should reside 
completely in RAM. The height field ray tracing 
approach maps intensities of source image pixels onto a 
simulation display. The math model associated with the 
simulation is used to characterize lines-of-sight or rays 
through pixels in the simulation display. If, for example, 
the simulation is a perspective view, then the simulation 
model is that of a frame camera. Height field ray tracing 


Graphics Interface ’92 




193 



(a) 

determines where rays first pierce terrain surfaces modeled 
as interpolated (bilinearly interpolated in this study) 
height fields. These pierce-points have ground 
coordinates that can be analytically mapped to source 
image pixels by applying the math model of the source 
image. 

Height field ray tracing is an inherently numerical 
process since the point where a ray first pierces an 
interpolated height field surface can generally be found 
only by tracing the ray out from the viewpoint until a 
pierce-point is encountered. This process has several 
components. First, the rays are characterized 
mathematically. The height field bounding box is then 
used to isolate the segment to be traced on each ray. 
Traversal progresses from point to point on the ray in ray 
trace steps. The height of the ray point at the end of each 
ray trace step is compared to the height of the height field 
cell that it lies above. Each cell is defined by four 
adjacent height field samples that lie at the comers of a 
rectangle. The height of the cell will be taken as the 
height of its tallest comer. If the ray could possibly 
pierce the cell that it lies above, a pierce-point 
calculation is performed [1]. If the ray does not pierce 
that cell, ray tracing continues until a pierce-point is 
found or the ray exits the height field bounding box. 

The ray tracing approach to photo-realistic 
visualization of terrain can be accelerated by reducing the 
number of rays to be traced or by accelerating the ray trace 



(b) 

Fig.l McCall Idaho: (a) reduced resolution copy 
of 5200 x 7200 source image (b) 512 x 512 
oblique perspective view generated from (a) 
by ray tracing every pixel. 

process. The number of rays to be traced can be reduced 
by tracing rays along lines-of*sight through every so 
many pixels on the simulation display and tile warping in 
between. For tile warping , ground coordinates associated 
with every nth pixel in the simulation display are 
computed by height field ray tracing. The source image 
math model is used to analytically map these terrain 
points onto the source image display. Warp coefficients 
that map tile pixels to source image pixels are determined 
for each rectangular tile in the simulation display. Source 
pixels associated with each tile pixel are then computed 
by tile warping, which costs less than ray tracing. 
Acceptable tile sizes are dictated by the nature of the 
terrain relief within tile field of view. 

To accelerate the ray trace process, ray vertical 
coherence is sometimes exploited [3-4]. Rays that lie in 
the same vertical plane (i.e., any plane normal to the 
ground plane) project to the same line on the ground and 
are said to be vertically coherent. In perspective views, 
vertically coherent rays pass through points that all lie 
on the same line in the focal plane and they are arranged 
in order of decreasing steepness along that line. Ray 
tracing can begin from the x (east) and y (north) 
associated with the pierce-point of the next steeper 
vertically coherent ray provided that the terrain surface is 
modeled as a single valued function of two variables. 
This technique accelerates the ray trace process by 
advancing ray trace starting points. It is most applicable 
when every pixel is to be ray traced. To take advantage of 
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ray vertical coherence in perspective view generation, 
pixels must be processed sequentially along lines in the 
simulation display corresponding to vertical planes. 

1.2 Height Field Ray Trace Methods 

Accelerations due to tile warping and ray vertical 
coherence cannot generally be combined because pixels 
on corners of tiles in the simulation display are not 
generally associated with vertically coherent rays. 
But height field ray tracing can also be accelerated by 
using faster height field ray trace methods. These 
methods can be used alone or in conjunction with either 
tile warping or ray vertical coherence to further accelerate 
the photo-realistic terrain visualization process. 

Incremental methods are the standard height field ray 
trace methods [1-5], As illustrated in Fig.2, they traverse 
rays in steps along intersections with height field cell 
walls, i.e., planes containing rows or columns of height 
field samples. These methods are intuitive and brute force 
in the sense that no cells are skipped. A height field 
resides in RAM during incremental ray tracing. 



j height field cells 


Fig. 2 Incremental ray trace steps a through h. 

Hierarchical methods never require more ray trace 
steps than incremental methods and usually require far 
fewer steps [6-7]. They typically run faster than 
incremental methods because they rely on a pre-computed 
quadtree representation of a series of reduced resolution 
height fields. This quadtree resides in RAM during ray 
tracing. In these quadtrees, the height of a quadrant of 
cells is taken as the height of the tallest cell in that 
quadrant. As illustrated in Fig.3, hierarchical methods 
reduce the number of ray trace steps by tracing over entire 
quadrants of cells whenever minimum ray height over that 
quadrant exceeds the height of that quadrant. Hierarchical 


ray tracing proceeds by inspecting quadrants of 
successively higher resolution. It steps over quadrants 
that the ray lies completely above and segments quadrants 
that the ray could potentially pierce into four sub¬ 
quadrants. These quadrants are pushed onto a stack of 
quadtree nodes in reverse order encountered along the ray. 
They are popped and analyzed in the order encountered 
until a pierce-point is found or the end of the ray has been 
reached (i.e., the stack is empty). 

Parametric methods are introduced in the next 
section. They fall into a new class of height field ray 
trace methods. The height field is horizontally sliced 
into evenly spaced cross-sections. For each cross- 
section, the Euclidean distance from each point to the 
nearest point where the slice cuts through terrain is 
computed off-line. These planes of distance values are 
collapsed into parameter planes that encode cones of 
empty space above each height field cell. Cone width is 
bounded by the terrain relief. Parametric ray tracing 
occurs along intersections between rays and these cones. 



height field cells 
Fig.3 Hierarchical ray trace steps a through h. 


2. Parametric Methods 

Consider a class of height field ray trace methods in 
which ray trace steps from any starting point on the ray 
are bounded by where the ray exits a cone-like volume of 
empty space balanced on its apex (see Fig.4). The apex is 
centered on the top of the height field cell that the 
starting point lies above. At any height, cone width is 
bounded by the distance to the closest terrain point that 
high or higher. For each cell, there is a distinct 360° 
terrain profile that defines the bounds of its associated 
cone-like volume. Profile height at a given distance from 
the apex is the height of the tallest terrain point that 
ground distance away or closer. Each 360° terrain profile 
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is thus nondecreasing and symmetrical about its 
associated apex. Parametric ray trace methods refer to 
height field ray trace methods of this type for which upper 
bounds on the 360° terrain profiles are represented by a 
set of curve parameters. 

cone-like volume 



2.1 Distance Transforms of Height Field 
Horizontal Cross-Sections 

Consider an array {z(i,j)} of cell heights, where the 

heights of the shortest and tallest cells are z . and 

min 

z max* s P ecif y K = the number of height field horizontal 

cross-sections (where = stands for "is defined as"). Define 
an increasing sequence {z k k = 0,...,K-1) of uniformly 

quantized heights from z m j n to z max as 
<*> z k =z min +kA z k = 0.-.K-l 


nearest bit map pixel of value one. They contain ground 
distances, in units of height field cells, from each cell to 
the closest cell of height no less than the height of the 
horizontal cross-section. The DT of height field 
horizontal cross-section k is 

(4) D k (i,j) = distance from (i,j) to the closest (i\j') 
such that B k (i' j') = 1 . 

Let us refer to [D k (i,j)} as the kth height distributional 

distance transform (HDDT) plane. Note that for all (i,j), 
D k+1 (ij) > D k (i,j) and D Q (ij) = 0. 

Parametric ray trace methods require exact Euclidean 
DT's at every pixel. There are several methods for 
computing Euclidean DT’s of bit maps [8-12]. Some 
produce approximations while others yield exact results. 
The unified distance transform algorithm of [12] was used 
to generate DT's here because it rapidly computes exact 
Euclidean DT's of arbitrary bit maps at every pixel even 
on general purpose computers. 

A height field of the McCall Idaho area obtained 
from the USGS is visualized in Fig.5(a) as a gray-scale 
intensity field in which high intensity corresponds to 
high altitude. The height ranges from 1489m to 2270m, 
the sample spacing is 30m and the height field has 
457x323 samples. As indicated in Fig.5(a), the terrain 
relief in the McCall quadrangle is moderate and most of 
the highest altitudes occur in the northern portion. A 
horizontal cross-section of this height field at a height of 
approximately 1890m is shown in Fig.5(b), where the 
black regions correspond to where the horizontal plane 
cuts through the terrain. The complete exact Euclidean 
distance transform of this height field cross-section is 
visualized in Fig.5(c) as a gray-scale intensity field in 
which high intensity corresponds to large distance from 
black regions in the cross-section. 


where the uniform height quantization A z is given by 


2.2 Parameter Planes 


< 2 > 


Then for k = 0,...,K-1, the bit planes 
(3) B k (i,j) = {* 


z(i.j) £ z k 

otherwise 


are the height field horizontal cross-sections at heights 
z k . These cross-sections can be used collectively to 

visualize the distribution of terrain heights. The number 
of ones in B k decreases as k increases. Moreover, the 

pixels of value one in B k+1 are a subset of those in B k 
and B 0 (ij) = 1 for all (ij). 

Distance transforms (DT's) of these bit maps are 
arrays of distances from each pixel in the bit map to the 


HDDT planes can be pre-computed for stacks of 
height field horizontal cross-sections and can 
conceptually be stored in a data structure called an HDDT 
stack. For each height field cell (i,j), there is a non¬ 
decreasing HDDT sequence {D k (i,j) k = 0,...,K-1) of 

Euclidean distances from cell (i,j) to the nearest cell of 
height no less than the cross-section heights z k . In the 

limit as A z -»0, these sequences become continuous non¬ 
decreasing mappings of ground distance vs. height, called 
HDDT profiles for cells (i,j). However, HDDT profile 
values are pre-computed for only certain height 
quantizations z=z k< 

To reduce RAM requirements, suppose that for each 
cell (i,j), the HDDT sequence {D k (i,j) k = 0,...,K-1) is 

replaced by a simple parametric representation that acts 
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Fig.5 (a) A height field visualized as intensity increasing with altitude (7.5* USGS McCall quadrangle, 
Idaho), (b) Height field horizontal cross-section at approximately 1890m. (c) Euclidean distance 


transform of (b). 

as a lower bound to the true HDDT profile. Then only the 
parameters of these representations would need to be 
stored and read into RAM. Arrays of cell parameter 
values, called parameter planes , would replace the HDDT 
stack, where each plane corresponds to a different 
parameter. These parameter planes can be updated off-line 
each time a new HDDT plane is generated. Once updated, 
that HDDT plane can be discarded. Parameter planes can 
thus be based on an arbitrary number of height field cross- 
sections because no more than one HDDT plane ever 
needs to reside in RAM at any given time during 
parameter plane generation. 

The simplest parametric representations possible 
for HDDT profiles are linear lower bounds. These 
representations have only two parameters, namely a 
slope and a z intercept. For each cell (i,j), the z intercept 
must be chosen greater than or equal to (equal to in this 
paper) the height z^ of the first horizontal cross-section 

for which D^ij^O. Once the z intercept for cell (i,j) has 

been specified, the slope for cell (i,j) is updated each time 
a new HDDT plane is generated. The slope for cell (i,j) 
must be chosen as the slope of some line that passes 
through the z intercept for cell (i,j) and a point (D^i.j), 

z k+i) for which z^ + i exceeds the z intercept. Of all such 

lines, the line of least slope must be taken as the linear 
lower bound. 

Let nip(i,j) and Zp(i,j) be the slope and z intercept 
parameter values associated with cell (i,j). Then {nip(i,j)) 
and {Zp(ij)} are the slope and intercept parameter planes. 

These parameters have an interesting physical 
interpretation. Zp(i,j) can be thought of as the height of 

the apex of a cone of empty space situated directly above 
cell (i,j) balanced on its apex. The reciprocal of nip(i t j) is 

the slope of a line which, when swept through 360° about 
the apex, defines a cone surface. In effect, the parameter 
planes encode conical volumes of empty space situated 


above each height field cell, where cone width is bounded 
by the terrain relief. As shown in Fig.6, parametric ray 
trace steps occur along intersections between rays and 
surfaces of such cones. Unlike the cone-like volume in 
Fig.4, these cones have an apex that may float above the 
associated height field cell and they are parameterized 
with line slope and intercept parameters. Each cone in 
Fig.6 would fit inside an associated cone-like volume 
such as the one depicted in Fig.4. 


cones of empty space 



Fig.6 Parametric ray trace steps a through c as 
intersections between rays and surfaces 
of cones of empty space. 

Incremental ray tracing requires RAM for storage of 
the height field. Parametric ray tracing requires RAM for 
storage of the height field and its parameter planes. If all 
data is stored in 4 byte floating point, the RAM 
requirements for linear parametric ray tracing are only 3 
times greater than those for incremental ray tracing. In 
contrast, hierarchical ray tracing requires RAM for 
storage of a quadtree representation of a height field 
resolution pyramid. The quadtree data structure is more 
complex than the simple raster data structures required for 
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incremental and parametric ray tracing. Among other 
things, the quadtree nodes must contain quadrant height, 
xy quadrant bounds and pointers to four child quadrant 
nodes. Assuming that each node requires 32 bytes, 
hierarchical methods require roughly 10 times more RAM 
than incremental methods. 

2.3 Analytical Computation of Parametric 
Ray Trace Steps 


A 2 2 1/2 

(7) 


If d z =i*O f the ray xy traversal line can be expressed as 

(8) D = m^z-Zj) 

(9) m r = A*d r /d z , d z *0 , 


In three-dimensional local planar coordinate 
systems with x (east), y (north) and z (height) axes, rays 
can be characterized by a starting point (or viewpoint) 

Pc = [x c ,y c ,z c ] and a direction vector d = [d v ,d„,dl as 

S 5 5b X Jr Z 

(5) p = P s + t d, t>0 

where the variable t specifies distinct points p = [x,y,z] 
on the ray. p s and d are derived from the parameters of 

the line-of-sight through some pixel in the simulation 
display. These parameters are dictated by the math model 
associated with the simulation. If the simulation model is 
that of a frame camera, then the simulation is a 
perspective view so p g is camera position and d is a 

function of pixel coordinates, camera focal length and 
camera rotation (tilt, swing, azimuth). At zero rotation, 
the camera frame is oriented such that its x axis points 
straight east, its y axis points straight north and its z 
axis (which is the optical axis pointed in the opposite 
direction) points straight up. The camera frame is 
obtained by rotating the zero rotation frame by the 
azimuth angle (from 0° to 360°) clockwise about its z 
axis, rotating the resulting frame by the tilt angle (from 
0° to 180°) about its x axis and rotating the resulting 
frame by the swing angle minus 180° (from 0° to 360°) 
clockwise about its z axis. Physically, azimuth reflects 
how far off north the camera is pointed. Tilt is the 
rotation between a ray pointed straight down and the 
optical axis. The camera points up when tilt exceeds 90° 
and points down when it is less than 90°. Swing reflects 
how much the camera is '’twisted" once azimuth and tilt 
have been applied. 

For any ray, the xy (ground) distance D traversed, in 
units of height field cells, from the ray point at height z f 

to the ray point at height z is given by the ray xy 
traversal line 

(6) (d z A)D = d r (z-zp 

where A is the distance (in meters) between adjacent 
height field samples and 


i.e., if d z *0, the ray xy traversal line has slope m r and 
intercept z f . For rays of constant height (d z =0), D takes 
on all real values at z=z f but is undefined for all other z. 

Thus, D is a linear mapping of xy distance vs. ray height 
traversed. It is decreasing for down-looking rays (d z <0), 

increasing for up-looking rays (d z >0) and of infinite 

slope for constant height rays. Moreover, ray xy 
traversal lines have slopes m^. that become steeper as 

their rays become more shallow (i.e., as d z decreases in 

magnitude). 

Let [* r ,y r ,z r ] be a point on the ray. Let D* be the xy 

distance associated with the parametric ray trace step from 
[XfO^Zf]. Let z* be the height of the point stepped to on 
the ray. From any point [x r ,y r ,z f ] situated over height 

field cell (i,j) on the ray, [z*,D*] is dictated by how the 
ray is positioned over the height field and the terrain 
relief. The ground distance mapping associated with 

[x f ,y r ,z r ] is dictated by and how the ray is positioned. 

The HDDT profile associated with cell (i,j) is dictated by 
the terrain relief. D* is limited by the distance to the 
point on the ray at height z* whose ground distance from 
[x r ,y r ,z r ] equals the ground distance from [x^y^ZjJ to the 

closest cell of height no less than z*. The maximum 
possible value for D* consistent with parametric height 
field ray tracing methods is given by the intersection 
closest to the z axis between the ray xy traversal line and 
the HDDT profile. 

Parametric methods compute lower bound estimates 
for these maximum ray trace step lengths as intersections 
[z*, D*] between ray xy traversal lines and parametric 
representations of HDDT profiles for cells (i j). When the 
parametric representations are linear, these intersections 
are easy to compute analytically (see Fig.7). The linear 
HDDT profile for cell (i,j) is given by 

(10) D = m p (i,j) [z - Zp(i,j)] . 

The intersection between the linear HDDT profile for cell 
(i,j) and the ray xy traversal line associated with ray point 
[ x r ,y r ,Zr] t ^ iat ** es a b° ve cell (i,j) is 
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( 11 ) 


( 12 ) 


D* = 


I m p( i O) z p ( i .j)~ rn r z r 

j m p (i,j)-m r 
^undefined 

{ m p (i.j)[z*-z p (i,j)] 
undefined 


d z *0, m p (i t j)^m r 

d z *0, m p (i,j)=m r 
z* defined 
z* undefined 


In Fig.7, the solid line is the linear HDDT profile (10) and 
the four dashed lines are ray xy traversal lines (6) or (8) 
with slopes m^i) i=l,2,3,4 that intersect the rays at 

[z*(i), D*(i)] i=l,2,3. Ray 1 points down, ray 2 is 
constant height and rays 34 point up. As for rays 1-3, if 
D* > 0, the parametric ray trace step from the current ray 
point [x f ,y r ,z r ] is to the ray point [x* ,y*,z*J an xy 

distance D* away where 

f[Xr-Kz*-z r ) d x / d 2 .y r -Kz*-z r ) d ^d z ] d z *0 

(13) [x*,y*]=s 

[(x r +<D*A)d x /d r> y r +(D*A)d y /d r ] d z =0 

If D* is undefined, less than zero or the ray point stepped 
to ends up above the same cell as the previous ray point, 
then it is necessary to jump start the parametric ray trace 
process. Jump starting is the process of stepping 
incrementally to the next cell so that parametric ray 
tracing can continue. 

Furthermore, as for ray 4 in Fig.7, if the ray points 
up and for D>0, the ray xy traversal line associated with 
the current ray point lies completely beneath the 
associated linear HDDT profile, then the ray can never 
pierce the ground. In this case, there is no xy distance 
from the current ray point for which the tallest terrain 
point that far away is at least as high as the ray point that 
distance ahead. In other words, the portion of the ray 
emanating from the current ray point never pierces the 
cone associated with the cell that the current ray point 
lies above. This mechanism for determining that certain 
rays never pierce the ground without having to ray trace 
further allows parametric ray trace methods to process 
certain rays with incredible efficiency. In addition, if the 
portion of the ray emanating from the current ray point 
does pierce the cone but the pierce point lies outside the 
height field bounding box, then the ray never pierces the 
height field surface and parametric ray tracing is 
complete. 

2.4 Parametric Height Field Ray Trace 
Algorithm 

The parametric height field ray trace algorithm can 
be summarized as follows: 


1. Initialise: 

a. if ray does not pierce height field bounding box 

then return 

else [x r ,y r ,z r ] <— ray trace starting point 

b. If ray points straight up or starts under height field 

then return 

c. (i.j) index of cell associated with ray trace 

starting point [x r ,y r ] 

if ray does not point up then 

[x r ,y r ,z r ] <- point where ray exits cell (i,j) 
if d z *0 then compute n^ 
status 4- 1 

while status = 1 

2. Compute Pierce-Point : 

if ray pierces cell (i,j) then return pierce-point 

3. Determine Next Parametric Ray Trace Step : 
if z [ > z(i,j) then 

a. if d z > 0 and z f > Zp(i,j) and n^. < nripfij) 

then return 
Compute [z*,D*]. 

if D* > 0 then [x r ,y r ,z r ] <- [x*,y*,z*J 

b. (i’o ) <- (i.j) 

(i,j) 4- index of cell associated with [x r ,y r ] 

If (ij) out of height field bounds 
then return 

4. Jump Start By Determining Next Incremental Rav 
Trace Step : 

if z x < z(i,j) or (z r > z(i,j) and i'=i and j'=j) then 
if d z > 0 then 

[ x r ,y r .Zrl point where ray exits cell (i,j) 

(i,j) 4“ index of cell for which [x r ,y r ,z r ] is 

the ray entry point 
If (i,j) out of height field bounds 
then return 

else 

(i,j) 4- index of cell for which [x I ,y r ,z r ] is 

the ray entry point 
if (i,j) out of height field bounds 

then return 

[* r .y r ,z r ] point where ray exits cell (i,j) 

To summarize, the ray trace starting point is used as 
the first current ray point. A pierce-point calculation is 
required if somewhere over the current cell, ray height 
drops below the height of that cell. If no pierce-point is 
found, the next parametric ray trace step is determined. 
Whenever the ray points up and the ray xy traversal line 
associated with the current ray point lies completely 
beneath the associated linear HDDT profile for D>0, the 
ray never pierces the ground. Otherwise, if a parametric 


& 
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ray trace step could not be determined or the ray point 
stepped to lies in the same cell as the current ray point, it 
becomes necessary to jump start the ray trace process by 
determining the next incremental ray trace step. The 
current ray point is then replaced by the ray point stepped 
to. If the new current ray point lands outside the height 
field bounding box, ray tracing is finished. Otherwise, 
the process iterates so that the next ray trace step can be 
determined. 


D 



Fig.7 Parametric ray trace steps as intersections 
between ray xy traversal lines (dashed) 
and a linear HDDT profile (solid). 


3. Experimental Results 

The amount of time required to generate photo¬ 
realistic views of terrain by height field ray tracing 
depends on the size and resolution of the height field, the 
obliqueness of the view, the number of rays traced, the 
speed or number of ray trace processors and the efficiency 
of the ray trace process. For height fields of fixed 
resolution, the number of potential ray trace steps is 
directly proportional to the height field range in east or 
north. For height fields of fixed area coverage, the 
number of potential ray trace steps is directly 
proportional to height field resolution. Oblique views of 
terrain can be particularly time consuming to generate by 
height field ray tracing because shallow rays often 
traverse long distances over the ground before they pierce 
terrain. View generation times vary in inverse 
proportion to the square of the spacing between ray traced 
pixels in the simulation display since tile warping, which 
is generally much cheaper than ray tracing, can be used to 
fill in the gaps. View generation times also vary in direct 
proportion to the number of ray trace processors, 
processor speed and the efficiency of the implemented ray 
trace process. The effect that the height field ray trace 
method has on ray trace efficiency is investigated in this 


section. All tests were performed on the same 10 MIPS 
Sun 4/280 processor. 

An experiment was performed with the 16 oblique 
views, listed in Table 1, of the McCall height field in 
Fig.5(a) (z m i n =1489 m, z max =2270 m). An imaginary 

frame camera with a focal length of 50 mm was devised. 
The interior orientation was chosen such that each pixel 
corresponds to 1/4 mm on an imaginary focal plane and 
pixel (0,0) lies at the center of a 512x512 simulation 
display. Each view thus has a wide opening angle of 
about 104°. Well-distributed rays were obtained by 
processing lines-of-sight through every 16th pixel in 
each view. 

For each view, the number of rays that pierce the 
terrain (npj erce ) out of 1024 is recorded in Table 1. Since 

all 16 viewpoints lie within the height field bounding 
box, all rays must be ray traced. The mean numbers of 
incremental, hierarchical and parametric ray trace steps 
over all rays traced in all views (i.e., nj, njj and np) are 

recorded in Table 1. The amount of time (t) that it takes to 
ray trace 16 views (every 16th pixel) is recorded in the 
last row of Table 1. K=160 height field horizontal cross- 
sections (A z «5 m) were used to generate the parameter 

planes. The average number of incremental, hierarchical 
and parametric ray trace steps ranged from roughly 130 to 
275, 6 to 9 and 2 to 6 respectively. The number of 
incremental to hierarchical ray trace steps averaged 
roughly 27 to 1. The number of incremental to parametric 
ray trace steps averaged roughly 56 to 1. The number of 
hierarchical to parametric ray trace steps averaged 
roughly 2.1 to 1. The hierarchical method ran roughly 7 
times faster than the incremental method. The parametric 
method ran roughly 40 times faster than the incremental 
method and roughly 6 times faster than the hierarchical 
method. Parametric ray trace steps cost somewhat more 
than incremental ray trace steps but considerably less 
than hierarchical ray trace steps. 

The three ray trace methods can also be compared 
with respect to their memory requirements. The 
incremental method required roughly 0.59 Mbytes for the 
height field (4 bytes per height sample). The hierarchical 
method required roughly 6.3 Mbytes for the quadtree 
representation of the height field resolution pyramid (32 
bytes per quadtree node). The parametric method required 
roughly 1.75 Mbytes for the height field and its two 
parameter planes (4 bytes per parameter value). The 
ratios of hierarchical to incremental and parametric 
memory requirements were roughly 10.7 to 1 and 3.5 to 1 
respectively. The ratio of parametric to incremental 
memory requirements was roughly 3 to 1. 

Although the number of incremental ray trace steps 
basically increases in direct proportion to height field 
resolution, it is theorized that the number of parametric 
ray trace steps remains relatively constant. If true, the 
ratio of incremental to parametric ray trace run times 
would increase in proportion to height field resolution 
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Table 1: Test View Parameters and Ray Trace Method Performance 


view 

viewpoint (meters) | 

orientation (degrees) 

n pierce 

(£1024) 

"i 

n H 

n p 

x s 

y s 

z s 

tilt 


azimuth 


569400 

4969900 

1800 

86.3 

176.7 

54.6 

518 

246.76 

6.14 

2.94 


573975 

4982200 

2200 

72.1 

180.1 

0.0 

600 

219.26 

8.23 

3.89 


573975 

4982200 

1650 

112.2 

180.1 

0.0 

345 

262.32 

8.39 

3.72 


569250 

4976235 

2000 

57.3 

180.2 

89.9 

748 

164.34 

9.34 

4.13 

K9 

569250 

4976235 

2000 

80.2 

180.1 

90.0 

557 

240.21 

8.00 

3.59 

\m mm 

578700 

4976235 

2200 

57.3 

179.8 

-89.9 

717 

163.54 

9.48 

5.99 


578000 

4974000 

1650 

83.2 

180.8 

-56.2 

555 

195.83 

6.69 

4.24 

8 

569230 

4975380 

1800 

82.5 

179.8 

76.4 

552 

229.33 

8.53 

3.27 

HI 

573500 

4982000 

1700 

78.1 

179.1 

-11.7 

584 

196.40 

8.37 

4.02 


572800 

4975600 

2200 

76.9 

174.0 

81.6 

571 

179.37 

7.98 

3.00 

■a 

578500 

4974000 

1700 

86.5 

183.2 

-57.5 

538 

212.71 

8.96 

3.81 

^lEIt 

569400 

4969900 

2200 

73.8 

180.2 

48.3 

584 

248.08 

6.52 

2.91 

H 

578500 

4970000 

2200 

53.2 

186.4 

-44.8 

760 

172.19 

8.16 

4.67 

mm 

569100 

4976235 

2000 

91.7 

180.7 

89.8 

474 

273.70 

7.27 

3.11 

■9 

574200 

4981000 

1600 

119.8 

182.1 

-76.0 

302 

218.09 

6.22 

3.34 





01 8 

181 * 

8* d 

HKvIl 

130.08 

7.84 

3.28 




mm^M 




557.88 

209.51 

7.88 

3.74 

MBM 


czz 






350 

51 

9 


References 


and the benefits of parametric ray tracing would be 
augmented. Work is currently under way to study the 
effect that data field resolution has on the parametric ray 
trace process. 

4. Conclusions 

Parametric ray trace methods often generate oblique 
photo-realistic views of terrain much more rapidly than 
hierarchical methods, which often generate such views 
much more rapidly than incremental methods. The 
number of ray trace steps attributable to parametric and 
hierarchical methods can theoretically never exceed the 
number attributable to incremental methods. 

Parametric and hierarchical methods require fewer ray 
trace steps than incremental methods because they use 
results of height field pre-processing that require 
additional RAM. Incremental methods require RAM for 
one height field. Hierarchical methods require RAM for a 
quadtree representation of a height field resolution 
pyramid. Linear parametric methods require RAM for one 
height field and two parameter planes. Linear parametric 
methods require roughly 3 times more RAM than 
incremental methods. Hierarchical methods require 
roughly 10 times more RAM than incremental methods. 
However, the hierarchical data structure is more complex 
than the simple raster data structures associated with 
incremental and parametric ray tracing. 
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Abstract 

While almost all research on image representation 
has assumed an underlying discrete space, the 
most common sources of images have the structure 
of the continuum. Although employing discrete 
space representations leads to simple algorithms, 
among its costs are quantization errors, significant 
verbosity and lack of structural information. A 
neglected alternative is the use of continuous space 
representations. In this paper we discuss one such 
representation and algorithms for its generation 
from views of 3D continuous space geometric 
models. For this we use binary space partitioning 
iLees for representing both the model and the 
image. Our approach falls under the general rubric 
of visible surface algorithms, providing an object- 
space algorithm which under certain conditions 
requires only sub-linear time for a partitioning 
tree represented model, and in general exploits 
occlusion so that the computational cost converges 
toward the complexity of the image as the depth 
complexity increases. Visible edges can also be 
generated as a step following visible surface 
determination. However, an important contextual 
difference is that the resulting image trees are used 
in subsequent continuous space operations. These 
include affine transformations, set operations, and 
metric calculations, which can be used to provide 
image compositing, incremental image modification 
in a sequence of frames, and facilitating matching 
for computer vision!robotics. Image trees can also 
be used with the hemicube and light buffer 
illumination methods as a replacement for regular 
grids, thereby providing exact rather than 
approximate visibility. 


Discrete vs. Continuous Space 

We have come to think of images as synonymous 
with a 2D array of pixels. However, this is an artifact 
of the transducers we use to convert between the 
physical domain and the informational domain. 
Physical space at the resolution with which we are 
concerned is most effectively modeled 


mathematically as being continuous, that is, as 
having the structure of the Reals (or at least the 
Rationals) as opposed to the structure of the 
Integers. Modeling space as being defined on a 
regular lattice, while simple, is verbose and induces 
quantization which reduces accuracy and can 
introduce visible artifacts. Using nothing other than 
a lattice for the representation provides no image 
dependent structure such as edges. 

Consider applying to a discrete image an affine 
transformation, an elementary spatial operation. 
The solution for this is developed by reasoning not 
merely in discrete space but in the continuous 
domain as well: samples are used to reconstruct a 
"virtual” continuous function which is then 
resampled. However, the quantization effects can 
become rather apparent should the transform 
entail a significant increase in size and a rotation by 
some small angle, despite the use of high quality 
filters. This is due to such factors as ringing, 
blurring, aliasing, and anisotropic effects which 
cannot all be simultaneously minimized (see, for 
example, [Mitchell and Netravali 88]). More 
importantly, discontinuities become increasingly 
smeared as one increases the size, since the 
convolution assumes a band-limited signal, i.e. an 
image with no edges. This has practical implications 
when texture mapping is used to define the color of 
surfaces in 3D: since a texture map can be enlarged 
arbitrarily, a brick texture, for example, will become 
diffuse instead of exhibiting distinctly separate 
bricks. 

Now consider applying affine transformations to 
images represented by quadtrees, a spatial 
structure, developed within the context of a finite 
discrete space, for reducing verbosity and inducing 
structure on an image. The algorithm for 
constructing the new quadtree of the transformed 
image seems relatively complicated when compared 
to the corresponding algorithms for continuous 
space representations: it must resample each 
transformed leaf node and construct an entirely 
new tree. In contrast, boundary representations, 
simplical decompositions, or binary space 
partitioning trees only require transforming points 
and/or hyperplanes (a vector-matrix product), and 
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no structural changes are required. An extremal 
example of this difference is the quad-tree 
representation of a square occupying a quadrant, 
which requires 5 nodes, but when slightly rotated 
or translated the number of nodes is on the order 
of the number of pixels lying on its boundary (say 
about 4k for a lk x lk grid). This rather dramatic 
metamorphosis illustrates quite clearly that the 
quadtree reflects the nature of a finite discrete 
space, a nature differing from that of the 
continuum, and that applying arbitrary affine 
transformations in discrete space can affect the 
structure of the representation, introducing 
quantization noise and requiring more complicated 
algorithms. 

We are inclined to state a stronger proposition: 
discrete space, as a regular lattice, supports weakly 
the semantics of the continuum. Assuming this, the 
difficulties with transforming pixel arrays and 
quadtrees is not so unexpected. A good model for 
images is one that treats them as functions mapping 
a continuous 2D domain to a color space (the 2D 
domain may be unbounded). Discrete space 
representations are then treated as approximations 
of this function, or as evaluations achieved by 
point-sampling the domain, and discrete space 
operations are then constructed as approximations 
to their continuous space analogs. To display the 
image, conversion to a discrete representation 
would still be needed, but this now becomes strictly 
an issue of sampling the image function. (This 
argument should not be confused with the random 
vs. raster scan distinction, which is a question of 
transducer technology, not of computational 
technology.) With this said, we will now consider 
methods of generating continuous space image 

representations from 3D continuous space 
geometric models. 

Visible Surface Algorithms 

The context in which continuous space image 

representations are most easily produced is 
synthetic image generation. Here one begins with a 
3D geometric model, defined using continuous space 
methods, from which a continuous space image 

representation is generated. This idea appeared 
very early in the development of visible surface 
algorithms, and in [Sutherland, Sproull and 

Schumacker 74] such algorithms were called 

object-space algorithms. But the approach has been 
neglected in favor of solutions utilizing quantized 
spaces (except in the Computational Geometry 

community). 

The algorithm of [Weiler and Atherton 77] is a 
well known example of a continuous space method 
for generating images, and since it resembles 

closely our own approach, we will describe it in 
some detail. The algorithm operates on a set of 

polygons defined in a 3D post-perspective screen- 
space; thus, all projectors are parallel to the z-axis. 
Each polygon is represented by a boundary 


representation of some variety. Presumably the 
polygons are the faces of a collection of polyhedra, 
but this property is not relied on. The algorithm 
proceeds by recursively partitioning space until 
homogeneous regions of the image are generated. 
Homogeneity in this case means, in 2-space, a 
region in which only one polygon is visible, or in 3- 
space, a region which is entirely visible or entirely 
occluded. The output of the algorithm is a set of 
polygons in 2-space with disjoint interiors whose 
union forms the image. These polygons are in 
general non-convex and contain holes. 

At each point in the recursion, a region r of 
space is partitioned into two sub-regions, which we 
denote as r" and r + . The partitioning set used is a 
3-space polygonal cylinder determined by the 
boundary of a polygon p, chosen from among those 
polygons that intersect r (Figure 1). The faces of the 
cylinder are orthogonal to the xy-plane, and so 
contain those projectors which go through the 
boundary of p. Since p may be of any genus, the 
sub-regions created by partitioning with p are not 
necessarily connected and are rarely convex. All 
polygons, including p, are then partitioned into 
subsets lying in r“ and r+, where we take r’ to be 
the sub-region containing p, i.e. the interior sub- 
region, and r+ to be the exterior sub-region. 



Weiler-Atherton Algorithm 
Figure 1 

Whenever there is some polygon p 1 in r with 
supporting hyperplane h such that p 1 = h n r, then 
all polygons lying "behind” p' are occluded by p\ 
such a polygon was called a surrounder in the 
literature on visible surface algorithms of the 70’s, 
taken from the analogy of a 2D window being 
surrounded by a polygon. The algorithm selects 
whenever possible the plane of such a polygon as 
the partitioning set and then treats the "far region" 
of p* as homogeneous, i.e. as being totally occluded, 
and so terminates recursion in that region and 
discards the occluded polygons. Note that the 
cylindrical partitioning by p above results in p 
being a surrounder for r - . Finally, whenever a 3- 
space region is generated containing no polygons, 
this region is necessarily homogeneous. 
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While the algorithm as just described is all that 
is required to generate a set of polygons forming 
the image, we have said nothing about which 
polygon is chosen as the partitioner when no 
surrounder is present. A typical technique aimed at 
improving the performance is to initially sort the 
polygons in z using for each polygon the smallest z- 
coordinate from among its vertices, and then to 
maintain this ordering when polygons are 
partitioned. The partitioner in the absence of a 
surrounder is then the first in this ordering among 
those intersecting a region. In the presence of 

multiple surrounders, the closest one is chosen. 

A similar but lesser known approach was 
described in an unpublished paper by Ivan 

Sutherland [Sutherland 73], where he develops a 
visible surface algorithm inspired by the ideas used 
in the ID sorting algorithm quicksort. Its output is 
the same as the above algorithm, i.e. a set of disjoint 
polygons, and it differs from that algorithm 
primarily in one aspect: in the absence of a 

surrounder, the partitioning set is a plane through 
only one edge of a polygon in r. The plane then is 
orthogonal to the xy-plane, or equivalently, it 
contains the edge and the center of projection 
(Figure 2). Selecting which edge to use at each point 
in the recursion is a heuristic process. Sutherland 
tried several heuristics without reaching any firm 
conclusions about what method was best. It is 

interesting to note that Sutherland's paper also 
contains a section discussing how this algorithm can 
be used for shadow generation, transparency and 
collision detection. 



Partitioning tree algorithm 
Figure 2 

As pointed out in [Harp 86], this algorithm can 
be treated as a binary space partitioning tree 
algorithm in that it uses a recursive partitioning by 
arbitrary hyperplanes; however, it does not 
generate a tree explicitly, a crucial distinction. This 
is not surprising given the original inspiration, 
quicksort. For indeed quicksort can be seen as 
implicitly constructing a ID binary search tree. 


which in turn can be interpreted as a ID 
partitioning tree. There is a somewhat subtle but 
important difference however: sorting has been 
developed in terms of points whereas space 
partitioning is in terms of hyperplanes. In ID, and 
only in ID, points and hyperplanes have the same 
dimension, viz. 0, and so it is easy to confuse them. 
But hyperplanes are (d-l)-dimensional not 0D sets, 
as are points. And they have an orientation that 
distinguishes the two halfspaces induced on a d- 
space, an orientation that can be used for ordering. 
Points have no such orientation, nor do they 
partition space, and so cannot be used to order d- 
space, d>l. This is one way to see why sorting 
algorithms are not applicable in dimensions other 
than ID. Indeed, if we "attach” the ordering 
relationship to a ID point, we then have a ID 
hyperplane. 

It seems apropos before leaving this section to 
discuss briefly the visible surface algorithm by John 
Wamock [Warnock 69]. It was the first recursive 
space partitioning, visible surface algorithm, and it 
follows the general scenario outlined above, the 
main difference being that the partitioning 
hyperplanes are not determined by polygonal 
edges (also, polygons were not explicitly 
partitioned). Today we see it as using a quadtree 
partitioning scheme. However, like the Sutherland 
algorithm, no explicit tree representation is 
generated; its output is a set of visible squares 
typically drawn directly into a pixel array. It is in 
effect a discrete space solution (also called a screen- 
space algorithm). It was to a certain degree the 
verbosity of this discrete solution that motivated 
the development of the two previously described 
continuous space algorithms. 

Partitioning Troos 

The binary space partitioning tree was originally 
developed in the context of visible surface 
determination. (The appendix contains a summary 
for those unfamiliar with the method.) [Schumacker 
et al 69] developed an incipient version that 
involved manual creation of a binary tree of 
vertical separating planes so that each object was 
separated from all other objects in the scene. The 
tree could then be used to generate a view- 
dependent visibility priority ordering. In [Fuchs, 
Kedem and Naylor 80] and [Naylor 81] three 
advancements were made: 1) the objects 
themselves were represented by the tree, 2) tree 
generation was automatic, 3) a dimension 
independent representation of space was 
introduced along with the name "binary space 
partitioning tree". As noted above, Sutherland also 
developed a number of ideas using this approach 
without generating a tree, the lack of which 
presumably contributed to his not realizing the 
connection between his work and that of 
[Schumacker et al 69]. 
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A generalized view of partitioning trees sees 
them not simply as representations of polytopes 
but as a representation of functions whose domain 
and range are continuous spaces of finite 
dimensions di and d2 respectively: f:Xe S d i => 
Y e S d 2. The partitioning tree partitions the domain 
into a hierarchical collection of sub-domains. Within 
each sub-domain a value-continuous function fj 
defines the value of f within that sub-domain 
(typically, fj is defined for all of S d i as well, 
although this is not essential). All points in S d i at 
which f is value-discontinuous are contained within 
partitioning hyperplanes. This interpretation is 
relevant to the work here since images are 
functions from 2-space to some color space. 

A partitioning tree also provides a structure 
enabling a hierarchical representation of f. As an 
example, consider polytopes. At the cells of the 
partitioning, each f\ is a boolean valued constant 
function indicating whether or not the cell is in the 
set. The polytope is the set of points P = closure( { 
c i I cj is an in-cell} ). Thus f is the characteristic 
function f* for the set P; the algorithm for 
computing f is the point classification algorithm 
given in [Naylor 81] [Thibault and Naylor 87]. A 
useful hierarchical representation of f* can be 
obtained by associating with each region r a 
constant function providing the conditional 
probability of a point being in P given that it is 
known to lie somewhere within r. Thus, for 
example, the value at the root of the tree is the 
expected value of the function. We use this idea 
below to detect regions of the image plane that are 
discovered to be totally occluded yet 
inhomogeneous. 

Partitioning Tree Visible Surface Algorithm 

Consider a 3D geometric modeling system in which 
all geometric sets are represented by partitioning 
trees. An explicit representation of the model can 
be formed by taking the union of all the objects 
comprising the model, resulting in a single tree. This 
model-tree can then be used, along with a 
particular view of the model, to generate a total 
visibility priority ordering on the components of 
the tree. This ordering can be either far-to-near 
(back-to-front) or near-to-far (front-to-back). 
Generating this ordering can be combined with 
view volume clipping, which performs a non¬ 
destructive intersection operation between the 
model and the view volume, generally in sub-linear 
time [Naylor 90b]. 

Regardless of the ordering, a partitioning tree 
representing the image can be generated by 
forming the union of the faces in priority order. 
More specifically, consider a near-to-far ordering 
with the initial value of the image being the empty 
set. 


1) project each face onto the 2D projection 
plane and let the attributes of each face be 
its color. 

2) form the 2-space union of the faces in 
priority order: 

image = Union_Sets( image, face ) 
where the attributes of the image take 
precedent over those of the face. 

Thus faces are projected to 2D regions of the image 
plane, and the effect of higher priority faces 
occluding lower priority faces is achieved by letting 
the attributes of the image tree take precedence 
over those of the current face being '’added” to the 
image. If the faces are represented by partitioning 
trees, then the union can be performed using tree 
merging [Naylor, Amanatides and Thibault 90] 
(figure 3), or if by b-reps, then by the algorithm 
given in [Thibault and Naylor 87]. If the reverse 
ordering is used (far-to-near), then the attributes 
of the new face would take precedence over those of 
the image tree. We see then that visible surface 
problem can be reduced to ordered set operations 
on polyhedral faces. (All of what has been said for 
3D -> 2D holds for any d > 1, since partitioning trees 
and their algorithms are dimension independent.) 



A union operation between two faces with 
attribute precedence 
Figure 3 


With either ordering, non-refractive 
transparency can be supported by using a "merge 
attributes” method that blends colors according to 
their opacity (alpha values). Given two polygons pi 
and p2* in which pi has color ci and opacity ai 
and occludes p2 which has color C2 and opacity a2, 
then the resulting color is ci,2 = (ci*al) + 
(C2*a2) * (1-ai) and opacity is ai,2 = ai + (1-ai) 
* a2 [Porter and Duff 84]. 

In addition the set of visible edges can be 
generated, if desired, by performing a closure 
operation which determines for each sub- 
hyperplane which subsets have a heterogeneous 
neighborhood [Naylor, Amanatides and Thibault 
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90]. So for example, in Figure 3 the subsets of 
hyperplane A that have homogeneous neighbor¬ 
hoods and so would not be on a discontinuity in the 
image are those that separate the two polka-dotted 
cells or two out-cells. This then provides a 
continuous space visible edge (hidden line) 
algorithm as an additional step after the visible 
surface algorithm. 

A 3D variant is obtained by transforming the 
faces into 3D post-perspective screen-space. Then 
each face is considered to define a polygonal cylin¬ 
der as in [Weiler and Atherton 77]. The union 
operation is now on 3D cylinders bounded on the 
near side by the plane of the face (see Figure 2). The 
faces could then be added to the image tree in anv 
order, although near-to-far still has advantages as 
discussed below. Note that the 3D image tree that 
this produces represents in continuous space 
exactly the same function represented in discrete 
space by the standard {frame-buffer, z-buffer) 
structure. 

The algorithm can also be performed in model- 
space, in which the cylinders are instead cones 
whose conical-vertex is the center of projection. 
This then becomes the algorithm present in [Chin 
and Feiner 89] which they applied to shadow 
generation, instead of image/visible-surface gen¬ 
eration 1 . The resulting model-space image tree i;an 
then be transformed by the viewing transform¬ 
ation into screen-space. Working in model-space is 
preferable numerically, as it avoids the problems 
encountered as a consequence of the non-linear 
perspective projection which corn-presses the 
depth at rate of z -2 . Note that this problem can be 
ameliorated somewhat by attempting to match the 
distribution created by the projection to the distri¬ 
bution of floating-point representable numbers. 
Uniformly distributed points in model-space 
become more compressed by the perspective pro¬ 
jection the greater the depth. Floating-point repre¬ 
sentable numbers become more dense the closer 
the value is to 0. The standard mapping of the near 
plane to z=0 and the far plane to z=l results in a 
mismatch: the greater the model-space depth the 
further the projected depth-value is from 0. This is 


1 Both their ideas and our ideas on this subject occurred 
independently. We first realized the potential presented 
here during the period in which we were developing 
the thesis that partitioning trees could provide a 
representation of polytopes [Thibauit and Naylor 87]. 
Being able to solve analytically the visible- 
surface/shadow problems with partitioning trees, 
analogous to [Sutherland 73], provided part of the 
supporting evidence for this thesis. [Chin and Feiner 89] 
extended the ideas in [Thibauit and Naylor 87] to 
generation of shadows. Concurrent with their work, we 
developed set operations on partitioning trees [Naylor, 
Amanatides and Thibauit 90], which then enabled us to 
implement the work described in this paper. However, 
we had originally conceived of our solutions in terms of 
screen-space using 2D trees, rather than the model- 
space approach with 3D trees in [Chin and Feiner 89]. 


trivially rectified by mapping the far plane to 0, and 
the near plane to -1 if in a left-handed system or to 
+ 1 if in a right-handed system. 

Now let us compare our algorithm, using a 
near-to-far ordering, both to the Weiler and 
Atherton algorithm and to the Sutherland algo¬ 
rithm. They are of course all quite similar. They 
recursively partition space using at each stage a 
binary partitioning set (i.e. any (d-l)-set that 
partitions a d-region into two d-regions), and the 
partitioning is determined by planes containing 
either a polyhedral edge and the center of 
projection and/or by planes of faces. Aside from 
differences arising from the availability of a priority 
ordering (to be discussed below), the relationship of 
our algorithm to Sutherland’s is simple: the order 
of ’’edge selection", i.e. partitioning hyperplane 
selection, is pre-determined by the priority 
ordering of the faces and the tree representing 
each face. And of course, we explicitly construct a 
tree to represent the output. 

The primary difference between our method 
and that of Weiler and Atherton, once again other 
than the priority ordering, is the representation of 
polygons: their representation is a variety of b- 
reps while ours is partitioning trees. This difference 
manifests both in the algorithms for set operation 
(between faces), and the form of the output (a 
graph vs. a tree). It is our contention that the set 
operation algorithm for b-reps are more 
complicated, slower, and less numerically robust 
than the corresponding algorithm for partitioning 
trees. Some early indication of this is suggested by 
the fact that the original set operation algorithm 
given in [Weiler and Atherton 77], which is based 
on a kind of parity counting of intersections, fails to 
handle co-incident boundaries correctly. A correct 
but more involved solution was presented later in 
[Weiler 80] based on Euler operations. 

If in the case of partitioning trees, the boundary 
is already represented by 2D partitioning trees 
lying in a 3D hyperplane, as discussed in [Naylor 
90a], then their representation in screen space 
either as 2-space entities or as 3-space cylinders is 
trivial, requiring the application of a single affine 
transformation (the inverse of the viewing 
transformation used for points). The tree merging 
algorithm can then be used to form the image tree 
as shown in Figure 1 above. Note that a single 2D 
partitioning tree can represent multiple coplanar 
connected components (faces), and a well built tree 
will generally yield better performance than 
operations on a list of connected components. 

The second and more important difference 
arises from our use of a 3D partitioning tree to 
represent the model and so to generate a priority 
ordering. As a consequence, the algorithm is 
simplified by eliminating the code and execution 
time for the initial approximating depth sort, and 
everything associated with the notion of 
surrounders (detection and ordering). Of greater 
consequence is that the partitioning of faces by any 
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occluded subset of an edge is automatically 
eliminated simply by using a near-to-far ordering. 
Indeed, at any point in the construction, the image 
tree can be interpreted as a 2D-polytope 
representing a visibility mask : interior regions 
correspond to occluded regions and exterior to 
unoccluded regions. Since additions are made only 
to unoccluded regions, any intersection between 
two occluded edges is never computed. 

Exploiting the creation of occluded regions of the 
image plane to reduce computation can be 
enhanced further by maintaining at internal 
regions a membership attribute indicating opacity 
within any region r. This can be either the 
percentage of r that is opaque, i.e. the expected 
value of a point lying in r being occluded (see Figure 
4), or simply a boolean variable indicating whether 
r is fully occluded or not. Maintaining this 
membership value during the insertion of a new 
face amounts to the standard condensation of 
homogeneous regions used in set operations, the 
difference being that a region which is 
homogeneous only with respect to opacity but not 
color is not replaced by a leaf node. Thus, whenever 
an internal region becomes fully opaque, it will 
become a cell of the visibility mask even though a 
subtree remains defining the image within this fully 
occluded region. Consequently, this subtree is 
never again accessed during subsequent processing 
of lower priority faces. Moreover, when the root 
region becomes occluded, rendering ceases. 



Maintaining % occluded at regions 
Figure 4 

This captures very simply the ideas present in 
other work using such masks 2 which require 


2 A recent example of this is [Sharir and Overmars 92], 
which is similar in many ways to our method, although 
it apparently was not implemented. They assume the 
existence of a visibility priority ordering, maintain a 
visibility map (our image tree) and a separate mask 
(our opacity attribute in the image tree). They also rely 
on merging of these. However, instead of adding one 
face at a time, they construct a separate visibility map 
for the next several faces, and then merge this with the 
’'current'* map, improving the worst case performance. 
This idea, if shown to be fruitful, can be easily applied 
to our method, since there is no algorithmic difference 
between a tree for a single face and another temporary 
image tree. 


algorithms comparable to set operations on b-reps, 
and it is the continuous space correlate of pixel 
masks, be they 1-bit per pixel or many bits per 
pixel masks (i.e. sub-pixel masks) [Fiume and 
Fournier 83] [Carpenter 84]. When combined with 
view-volume clipping, an effect is achieved 
somewhat analogous to the culling methods 
presented in [Teller and Sequin 91]. While one 
would presume that their additional preprocessing 
would lead to noticeably less computation to 
generate an image, our method permits a dynamic 
geometric model (and of course none of the 
requisite preprocessing and storage of the resulting 
information). 

There is, however, a notable deficiency with our 
scheme as outline above: the order in which the 
image plane is partitioned is predetermined by the 
visibility ordering. However, the order in which 
hyperplanes are chosen affects significantly the 
"goodness" of the trees, i.e. the efficiency of the 
search structure provided by the tree. We have 
come to realize that an efficient partitioning tree is 
one that represents the set/function as something 
analogous to a sequence of approximations [Naylor 
92]. We have implemented tree construction 
methods employing expected case models for 
various elementary operations and these methods 
produce such trees. What we would then like is to 
reflect within the image trees this effort at 
constructing good trees. To achieve this, instead of 
building the image tree from scratch, we modify (a 
copy of) the existing model tree so that it will 
become a representation in 3-space of the occluded 
and unocculded regions. This can be performed 
equally well in either model-space or screen-space, 
with the afore mentioned caveat that screen-space 
induces a numerically undesirable compression of 
the depth. 

There are several ways to apply this idea; we 
will describe here only the simplest. We still 
traverse the tree in a near-to-far priority order. 
However, after forming the 3D face-beam, instead 
of performing image u face-beam we perform 
model u face-beam. As a consequence, entirely 
occluded faces will be removed by this union 
operation, and so will not have their face-beam 
constructed only to find that it is totally occluded, 
as will occur with the previous method. Indeed, 
every subtree of the model-tree that is found to be 
totally occluded will be condensed automatically by 
the union operation to a single cell before it is 
encountered in the priority traversal (Figure 5). 

An extremal illustration of the power of this 
approach occurs when an entire object is occluded 
by a single face of another object. The beam for that 
face will "engulf* the object, and it will be reduced to 
a single "occluded" cell. Given our tree construction 
methods, the computation required in such a case is 
comparable to computing the union of the beam 
with a bounding volume of the object, yielding 
constant time elimination of the occluded object. 
Thus, under favorable conditions, the visible 
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surface can be computed in sub-linear time (and 
this is in addition to the typically sub-linear clipping 
of the model to the view volume). More generally, 
this approach exploits during "beam insertion" the 
efficient search structures previously generated 
for each object and the gains from condensing 
homogeneous regions. Equally important, it retains 
a desirable residue of this structure in the resulting 
image tree, and this residue is important for 
efficient execution of any subsequent spatial 
operations, such as those discussed in the next 
section. 



With viewer V f subtree S is occluded by face 
F and is removed by condensation. 

Figure 5 


Utilizing image Trees 

Given an image tree, one can sample it for display. 
There are a number of ways to do this. The simplest 
but most expensive would be to use point 
classification for each pixel to determine its color. 
This would, however, allow one to use non-uniform 
sampling techniques [Mitchell 87] for anti-aliasing. 
A more reasonable alternative would be to classify 
scan-lines. But since parametric representations 
are ideal for scan-conversion, and b-reps are in 
effect such representations, one can use the 
algorithm in [Thibault and Naylor 87] to classify an 
initial b-rep polygon corresponding to the viewport. 
This yields a disjoint set of convex polygons each as 
list of vertices, one for each cell, whose attributes 
are the color of the corresponding cell. Finally, if the 
faces of the polyhedra are given as b-reps, then 
these can be retained in the process that constructs 
the image tree, i.e. during the union operations, as 
described in [Naylor, Amanatides and Thibault 90]. 
Thus, by extracting these from the image tree, one 
obtains an output similar to that generated by the 
b-rep based algorithms, viz. a set of convex 
polygons each represented by a list of vertices. 

Since only the visible surfaces are scan- 
converted, texture mapping and per-pixel 
illumination calculations (Phong shading) will be 
computed only for visible pixels. In addition, 
transparency is calculated between polygons 
rather than repeatedly for each pixel, and so can be 
provided on systems that do not have the requisite 
pixel-level hardware. The accuracy of anti-aliasing 
can be improved significantly, since the visible 


surface is represented at the resolution provided 
by floating point, which provides a much higher 
degree of accuracy than is practical with discrete 
space. Polygonal edges can be filtered using either 
continuous or discrete space representations of the 
filter, with the results being accumulated in the 
fame buffer using calculations analogous to those 
described for transparency. This then provides the 
continuous space version of sub-pixel mask 
techniques for anti-aliasing presented in [Fiume 
and Fournier 83] [Carpenter 84] and [Abram, 
Westover and Whitted 85], and no per-pixel list of 
micro-polygons with an approximating depth-sort 
is needed as in [Carpenter 84]. It is also a more 
efficient form of the per-pixel "analytic" approach in 
[Catmull 78] which relied on Sutherland's algorithm 
for visible surfaces. And for line drawings on B&W 
printers and displays, the visible edges can be used. 

As discussed in the introduction, an immediate 
advantage of continuous space representations is 
that affine transformations can be applied with 
ease. Images can be scaled by (S x , Sy, Sz) 
corresponding to a model-space scaling of ( S x , Sy, 
1/Sz )• A rotation of an image about the screen- 
space z-axis by 0 is comparable to a rotation by 0 
about the model-space image of this axis, which is 
the axis through the center of projection and 
orthogonal to the projection plane. A translation of 
the image by ( T x , Ty ) is equivalent to a shearing 
with respect to this same model-space axis by ( Sh x 
= T x , Shy = Ty ). If the perspective is not too severe, 
then this approximates a similar translation in 
model-space. For defining texture on a surface, a 2D 
image tree can be affinely transformed in order to 
map it into screen-space and then sampled (note 
that transforming hyperplanes into screen-space 
requires no "perspective division", but only an 
affine transformation). In either case, no 
quantization artifacts, such as enlarged pixels or 
blurred edges, occur. 

Image trees can also be used in subsequent 
continuous space operations. As noted above, this 
provides a continuous space version of what has 
been represented in -discrete space by a rgbaz 
buffer. Therefore, compositing operations can be 
performed on 3D images as discussed in [Duff 85] (in 
that work, the space is discrete). These operations 
can be interpreted as set operations with blending. 
More specifically we have the following equality: 

image( A <set op> B ) = 

image( A ) <set op> image( B ), <set op> e (u, n} 

And set difference can be used for masking. Since 
our 3D image trees are of the same data type as any 
other of our geometric sets, the previously devel¬ 
oped set operations can give us compositing immed¬ 
iately. Blending is provided by the same mechanism 
that provides non-refractive transparency. Using 
3D instead of 2D images frees the compositing from 
being simply a layering of images on top of each 


Graphics Interface ’92 




208 


other, as in the case for traditional cell animation or 
video games; i.e. visibility is not restricted to a total 
order on the individual images, instead they may 
be interleaved. 

While compositing has not been associated with 
interactive 3D graphics, consider the rather likely 
situation in which a user has a model comprised of a 
collection of objects. Typically, the user will engage 
in modification of only one object at any given time 
while the view remains stationary. Then an image 
tree can be constructed for the static objects once 
at the beginning of this interaction, and the image 
of the model is generated by 

image{ mode! ) = image( static-objects ) u 
image( dynamic-object ) 

This will yield more benefits the greater the 
number of static objects, the greater the amount of 
occlusion, and the greater the duration between 
selecting a new dynamic object. (For those readers 
familiar with random-scan display systems, each 
image tree is analogous to a segment.) 

Additionally, it is possible to redraw into a 
frame-buffer only those faces whose visibility has 
changed between successive frames. To do this, one 
needs to maintain in the static-object’s image tree a 
frame index at each node v. This will indicate the 
last frame in which the subtree rooted at v was 
changed by the union with the dynamic-object 
image tree. The drawing process needs to traverse 
only those subtrees which have changed in the 
current frame or else in the immediately prior 
frame so that the image of previously but no longer 
occluded static-object faces can be redrawn. This 
then provides a simple means of exploiting temporal 
correlation (frame-to-frame coherence) in this 
particular setting, i.e. static view and relatively few 
moving objects. 

Visibility computations are, of course, crucial in 
the evaluation of all light transport equations. The 
equivalence between visible surface and shadow 
computations was recognized at a fairly early stage. 
Thus, our model-modification method can be used 
to partition model space into regions that are 
homogeneous in the number of lights visible from 
any point in that region, which then provides a way 
to classify any other set to determine its light- 
source visibility and simultaneously detect 
collisions. For global illumination, a well established 
technique is the hemicube method [Cohen and 
Greenberg 85] which for each surface element 
projects the scene onto a half-cube whose surface 
has been partitioned by a grid, and visibility is 
approximated within each grid-cell at the midpoint. 
Image trees provide an alternative to this grid. For 
each face of the hemicube, one can use our methods 
to represent the image. And instead of approx¬ 
imating the form factors discretely, the transport 
can be computed exactly using contour integration 
[Nishita and Nakamae 85] [Campbell 91]. This then 
leads to a global illumination algorithm with certain 
similarities to that of [Campbell 91] which is also 


based on partitioning trees. Similarly, image trees 
can be used to implement light buffers [Haines and 
Greenberg 86], once again, with exact rather than 
approximate visibility, yielding a significantly sim¬ 
plified methodology. 

Generating image trees from 3D models 
provides a potentially important companion to our 
work on a discrete-to-continuous transform in 
which a pixel array representation is converted 
into a corresponding partitioning tree repre¬ 
sentation [Rahda et al 91]. Currently, we can solve 
no more than the segmentation problem; texture 
representation remains an open issue. However, 
this may be enough for certain applications. 
Consider a robotics application in which the 
constituents of an external environment are known 
a priori and for which a geometric model has been 
constructed. The problem is to maintain a cor¬ 
relation between an internal geometric model and 
the dynamic external physical state, given an initial 
correlated state. One could construct two image 
trees, one from the discrete image provided by a 
camera, and the other from the current view of the 
geometric model. These then could be correlated by 
an iterative process using affine transformations, 
set operations (symmetric difference) and calcu¬ 
lation of moments, and in doing so determine how 
the geometric model should be updated. It may also 
be possible to use image trees in template matching. 

Examples 

Pictures 1-5 provide a few examples of generating 
image trees. The number of faces for pictures 1-3 
are given below for each of the three rendering 
methods: painter’s algorithm (method 1), creating 
a new image tree (method 2), and modifying the 
model tree (method 3). For the phone handset, the 
difference between method 3 and the number of 
front-facing polygons is due to the fact that the 
polygons forming the sound transmitting holes in 
the hand set are contained in fully occluded 
subtrees which are condensed to a single cell (the 


tree has 

also been clipped). 


object 

method 1 

method 2 

method 3 

head 

705 

1982 

703 

shuttle 

499 

1100 

523 

phone 

432 

353 

141 


In Picture 4, we have composited two 3D image 
trees using a union operation. Picture 5 shows a 
skewed view of this revealing the solid nature of the 
images (the full boundary of the solid images has 
been generated only for the purpose of illustrating 
their 3D nature). 

Appendix 

Binary space partitioning trees , also called bsp trees 
or partitioning trees , are defined by a generating 
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algorithm, and for this only one operation is 
required: binary partitioning by a hyperplane of a 
region in a d-dimensional continuous space, d > 0. 
Figure A.l illustrates this. Given a homogeneous 
open region r, a hyperplane h that intersects r is 
chosen using some criteria. Then h is used to induce 
a binary partitioning on r that generates two new 
d-dimensional regions, r+=rn h+ and r“ = r n 
h", where h + and h* are the positive and negative 
open halfspaces of h respectively. Also, generated is 
a (d-l)-dimensional region r° = r n h, called a 
sub-hyperplane (abbr. as shp). Thus r = r+ u r‘ u 
r°= (r n h+)u (rn hr) u (r n h). Any of these 
new unpartitioned homogeneous regions can be 
partitioned similarly, and so on recursively. When 
the process is terminated, the remaining un¬ 
partitioned regions, called cells , together with the 
sub-hyperplanes forms a partitioning of the initial 
region. (In figure A.l, the cells are labeled with 
numbers and the sub-hyperplanes with letters.) 


r O 


Initial region and tree 


A 


First binary New tree 
partitioning 




Spatial partitioning 


/ A \ 

A A 
a v 6 

1 2 4 


\ 


Binary tree 


Constructing a partitioning tree 
Figure A.l 

This process, when begun with d-space as the 
initial region, induces a structure on d-space in the 
form of a hierarchical decomposition. A partitioning 
tree is the computational representation of this 
process, and its combinatorial/syntactic form is 
captured by a binary tree. This tree is simply the 
directed graph of an asymmetric relation defined 
on the set of regions generated by the process 
where ri->r2ifr2 was created directly by a 
partitioning of ri. The tree also corresponds to the 
graph of the partial ordering of the regions induced 
by the subset relation. In addition, the tree can be 
interpreted as a type of computation graph by 
interpreting the arcs as intersection operations: 
"moving” a set s contained in a region r and 
partitioned by hyperplane h along a left arc from r 
to r“ can be interpreted as computing s n h", and 
similarly for the right arc. This interpretation 
provides a set theoretic definition of any region r 1 
as the intersection of open halfspaces corre¬ 


sponding to arcs on the path from the root to r\ In 
figure A.l, cell-3 = 2-space r\ A - n B+. 
Consequently, if the initial region is a convex and 
open set, it follows that all regions of the tree are 
convex and open. 

A partitioning tree can provide the basis of a 
computational object for the semantic domain of 
geometric sets . These are subsets of continuous 
spaces of finite dimension for which each point has 
an associated set of attributes (e.g. color). The 
partitioning tree provides an isomorphism between 
certain geometric entities and a combinatorial 
structure manipulated by algorithms; in other 
words, the binary tree is a syntactic entity whose 
intended interpretation, or model (as in Model 
Theory), is a geometric set. In particular, a 
polytope, or collection of polytopes, can be 
represented by associating with each cell a 
membership attribute = { in, out }, dividing the cells 
into in-cells and out-cells . The poly tope may be of 
any topology, including multiple connected com¬ 
ponents, and have a boundary that is non-manifold 
and/or unbounded. All possible trees represent 
some topologically valid poly tope, although if a tree 
is chosen at random, certain subtrees may 
correspond to the empty set or to a homogeneous 
region. This means that every syntactically valid 
tree, i.e. any binary tree with hyperplanes at 
internal nodes and membership attributes at leaf 
nodes, represents a semantically valid polytope. 

For any point in d-space, its e-neighborhood 
with respect to the polytope can be discovered by 
following the paths in the tree to any cell whose 
closure contains the point. This is just the standard 
method of inserting a point into a search tree, with 
the simple extension that whenever a point is found 
to lie on a partitioning hyperplane, both subtrees 
are visited. The cells reached by the traversal are 
exactly those lying in the point's e-neighborhood 
[Thibault and Naylor 87] . 

Any central projection using linear projectors 
(rays) determines a partial ordering, called a 
visibility priority ordering , on the regions of any 
partitioning tree. This ordering depends only upon 
the center of projection. The total priority ordering 
induced by any single ray on the subset of the 
regions it intersects is consistent with this "global" 
partial ordering. The ordering is possible because 
for a ray t and hyperplane h, their intersection is a 
single point, unless t lies in h . This intersection 
point partitions t into near , on and fax subsets. This 
implies that any set in the near-half space of h has 
priority over any sets lying in h which in turn has 
priority of any sets in the far-halfspace. Given a 
partitioning tree representation of polyhedra, 
discovering that the viewing position is in say the 
positive halfspace of a partitioning hyperplane h at 
node v means that all sets represented by the 
positive subtree of v have priority over any sets 
lying in h which then have priority over those sets 
represented by the negative subtree. One can apply 
this local ordering recursively to generate a total 
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priority ordering of all sets represented by the 
tree. (See [Schumacker et al 69] or [Sutherland, 
Sproull and Schumacker 74], and [Fuchs, Kedem 
and Naylor 80] or [Naylor 81]). 

Along similar lines, efficient ray-tracing 
algorithms have been devised [Naylor and Thibault 
86] which exploit both the convex decomposition 
and the inherent hierarchical search structure. 
Calculation of shadows due to point light sources is 
addressed in [Chin and Feiner 89] and due to area 
light sources in [Chin and Feiner 92] and [Campbell 
91]. Use of partitioning trees for global illumination 
calculations can be found in [Fussell and Campbell 
90] [Campbell 91], Algorithms for set operations are 
presented in [Thibault and Naylor 87] and [Naylor, 
Amanatides and Thibault 90], 

In [Naylor 81], it was shown that partitioning 
trees could represent arrangements of hyper- 
planes, and the complexity of the arrangements 
was used to give a bound of 0( nd ) on the size of 
the largest possible partitioning tree formed using 
n hyperplanes in d-space. It was also shown that a 
set of disjoint (d-l)-faces could result in a tree of 
size nd-1 ). In [Paterson and Yao 90] algorithms 
are given for converting a set of non-intersecting 
faces to a partitioning tree of size 0( n d -l ) in time 
0( n d + l ), d > 3, which is reduced to 0( n 2 )and 
0( n 2 ) for 3D. In 2D, the tree size and run time are 
both 0( n log n ). A convex n-gon can be repre¬ 
sented by a tree of size ©( n ) and depth ©( log n ) 
and two such trees can be merged in 0( n log n ) 
[Naylor 92]. Two arbitrary trees each of size n can 
be merged in ©( nd ) for d = 2, 3 or 4 [Naylor, 
Thibault and Amanatides 90]. However, empirical 
results in [Naylor 81] and much subsequent 
experience indicate that polygonal models of real 
objects result in trees much closer to 0( n log n ). 

Partitioning trees are the same computational 
structure as linear decision trees [Rabin 72], which 
have been used to prove lower bounds on various 
problem, e.g. that sorting is £2( n log n ). Another 
application of this structure, concerned primarily 
with representing a finite set of points is called 
polygon trees [Willard 82] or partition trees . 
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Abstract 

The escape-time method was introduced to generate im¬ 
ages of Julia and Mandelbrot sets, then applied to visu¬ 
alize attractors of iterated function systems. This paper 
extends it further to language-restricted iterated function 
systems (LRIFS’s). They generalize the original defini¬ 
tion of IFS’s by providing means for restricting the se¬ 
quences of applicable transformations. The resulting at¬ 
tractors include sets that cannot be generated using or¬ 
dinary IFS’s. The concepts of this paper are expressed 
using the terminology of formal languages and finite au¬ 
tomata. 

Keywords: fractal, iterated function system, escape¬ 
time method, graphics algorithm, formal language, finite 
automaton. 

1. Introduction 

Although mathematicians have explored the properties 
of fractals since the turn of century, they could not visu¬ 
alize the objects of their study without the aid of comput¬ 
ers. Computer graphics made it possible to recognize the 
beauty of fractals, and turned them into an art form [13]. 
Peitgen and Richter [14] perfected and popularized im¬ 
ages of Julia and Mandelbrot sets. Many of them were 
created using the escape-time method. In its original 
setting, it consisted of testing how fast points z outside 
the attractor diverged to infinity while iterating function 
z —► z 2 -f c in the complex plane. The resulting values 
were interpreted as colors in a two-dimensional image, or 
height values in a “fractal landscape” [15, Section 2.7]. 

The escape-time visualization method was extended from 
Julia sets to iterated function systems [12] by Barnsley [2] 
and Prusinkiewicz and Sandness [18], This paper ex¬ 


tends it further to language-restricted iterated function 
systems, introduced in [16]. They generalize the origi¬ 
nal definition of IFS’s by providing means for restrict¬ 
ing the sequences of applicable transformations to a par¬ 
ticular set. The resulting attractors form a larger class 
than those generated using ordinary IFS’s. The defini¬ 
tion of an LRIFS leaves open the mechanism for sequenc¬ 
ing transformations, thus LRIFS’s incorporate the ear¬ 
lier generalizations committed to a particular mechanism, 
such as hierarchical IFS’s [4], sofic systems [1], recurrent 
IFS’s [3], Markov IFS’s [21], mixed IFS’s [5], controlled 
IFS’s [17], and mutually recursive function systems [7, 8]. 
Several other authors considered similar generalizations 
without giving them a name. Visualization of the attrac¬ 
tors of generalized IFS’s has been addressed by Hart [9], 
referring to his earlier results with DeFanti [10]. 

This paper is organized as follows. Sections 2 and 3 sum¬ 
marize the background material related to formal lan¬ 
guages and iterated function systems. Section 4 presents 
the escape-time method for IFS’s in a way suitable 
for further extensions. Section 5 defines the language- 
restricted iterated function systems. The escape-time 
method is extended to LRIFS’s in Section 6. A special 
case of regular languages is considered and illustrated 
using examples in Section 7. Section 8 summarizes the 
results. 


2. Formal languages 

An alphabet V is as a finite nonempty set of symbols or 
letters. A string or word over alphabet V is a finite se¬ 
quence of zero or more letters of V, whereby the same let¬ 
ter may occur several times. The total number of letters 
in a word w is called its lengthy and denoted length(w). 
The word of zero length is called the empty word and 
denoted e. The concatenation of words x = a\a ,2 ... CL m 
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and y — b\b 2 • > .b n is the word formed by extending 
the sequence of symbols x with the sequence y } thus 
xy — <*1^2 • * • (imbibe ., . 6 n . If xy = w then the word x 
is called the prefix of w t denoted x -< w. The remaining 
word y is called the suffix. We assume that the relation -< 
is reflexive, that is, w -< w. The n-fold concatenation of 
a word w with itself is called its n-th power , and denoted 
w n . By definition, w° = c for any w. If w = a\a ,2 .. .a„, 
then the word w R = a n ... a^cti is called the mirror im¬ 
age of w. It can be shown that (xy) R = y R x R for any 
words x and y . 

The set of all words over V is denoted by V*, and the 
set of nonempty words by V+ . A formal language over 
an alphabet V is a set L of words over V , hence L C 
V*. The concatenation and mirror image of words are 
extended to languages as follows: 

LiL 2 = {xy : x E Ly k y E L 2 }, 

L r = : w E L }. 

A language L is prefix extensible if there exists a word 
v £ V+ such that vL C L. In other words, vw E L for 
every word w £ L. The right derivative of a language 
L C V * with respect to a word v £ V* is the language: 

L//v = {w £ V* : vw £ L). 

The set of all prefixes of a language L is called the prefix 
closure of L: 

V(L) = {x£V* .(3w£L)x< w}. 

3. Iterated function systems 

Let (JY, d) be a complete metric space with support X 
and distance function d (in this paper, we will only con¬ 
sider the plane with the Euclidean distance). A function 
F : X — X is called a contraction in X if there is a 
constant r < 1 such that 

d(F(P),F(Q))<rd(P,Q) 

for all P y Q £ X. The parameter r is called the Lipschitz 
constant of F. 

An iterated function system (IFS) in X is a quadruplet 
X =■ (Xy TyVyhy), where: 

• X is the underlying metric space, 

• T is a set of contractions in X , 

• V is an alphabet of contraction labels, 

• h : V -+ T is a labeling function, taking the letters 
of alphabet V to the contractions from T. 


In the literature, an IFS is usually defined as the pair 
(X,T). We extend this definition by specifying the al¬ 
phabet V and the labeling function h to facilitate the 
introduction of language-restricted IFS’s in Section 5. 

The function h is extended to words and languages over 
V using the equations: 

k(a 1 a 2 ...a n ) - h(ai) o h(a 2 ) o ... o h(a n ), 

' h ( L ) = U 

w^L 

where the symbol o denotes function composition, 

xo/ 1 o/ 2 o...o/ n =/ n (...(/ 2 (/ 1 (x)))..)- 

The attractor of an IFS X is the smallest nonempty set 
A C X, closed with respect to all transformations of T\ 
and closed in the set-theoretic sense. Hutchinson showed 
that the attractor of an arbitrary IFS always exists and 
is unique [12]. Consequently, it can be found by selecting 
a point P £ Ay and applying to it all possible sequences 
of transformations from T\ 

•4 = cl(Poh(K*)), 

where the symbol cl represents the set-theoretic closure of 
the argument set. There are several methods for finding 
the initial point P £ A. For example, the fixed point of 
any transformation F £ T is known to belong to A [12]. 
A legible notation for specifying transformations is 
needed while defining particular IFS’s. In this paper 
we express transformations by composing operations of 
translation, rotation, and scaling in an underlying Carte¬ 
sian coordinate system. The following symbols are used: 

• t(a,b) is a translation by vector (a, 6). 

• a(a) is a rotation by (oriented) angle a with respect 
to the origin of the coordinate system. The angles 
are expressed in degrees. 

• is a scaling with respect to the origin of 
the coordinate system: x* = r x x and j/ = r y y. If 

— r y = r, we write s(r) instead of s(r , r). 

For example, Figure 1 shows the attractor of an IFS 
X = ( X , T, Vy h), where the set T consists of two trans¬ 
formations: 

F\ = * 0 r (45), 

F 2 = .(^)or(135)ot(0,l). 
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Figure 1: The dragon curve 


4. The escape-time method 

Consider an IFS X = {X ) Ty V, /i), where all functions 
F G T are invertible. Let h(a) denote the inverse of the 
contraction F = h(a) G T, or h(a) = (h(a))* 1 . The 
function h is extended to words and languages over V in 
a way similar to h : 



• res(Z\ , a) = res(Zi, &) = 0, since Z\ g C , 

• res(^ 2 f a) = 0, since Z 2 0 h(a) = Z 2a G C, 

• res(Z 3 ,a) > res(Z 3 ,&), since ||Z 3o || < ||Z 3 6||. 


Figure 2: Illustration of the residual terms res(Z,a). 


ft(aia 2 ... a n ) = h(ai) o h(a 2 ) o ... o h(a n ), 

h(L) = |J h(w). 

w£L 

A trajectory of a point Q with respect to a word w G V* 
is the set: 

Tr(Q, w) = {Q o h(ar) : x -< it;}. 

The length of w is referred to as the length of the tra¬ 
jectory. The escape-time method for visualizing the at¬ 
tractor of X is based on the following Theorem, proven 
in [18]: 

Theorem 1. (a) If a starting point Q belongs to the 
attractor A of an IFS X, there exists an infinitely long 
trajectory entirely included in A. (b) If the point Q does 
not belong to A, all trajectories diverge to infinity. 


with a higher precision, Hepting et ai [11] introduced a 
residual term that reflects the distance between the last 
point in the escape trajectory Tr(Q, w) and the border 
of circle C : 

E 2 (Q) = 

max^{length(iw) + res(Q o a) : Tr(Q, w) C C}. 

Let Z = Q o h(w). The function res : X X V —► [0, 1) 
is defined as follows: 

res(Z, a) = (1) 

log ft — log 1|^| | if Z 6 C and 
log \\Z o h(a)\\ — log \\Z\\ Zoh{a)tC, 

0 otherwise. 


To estimate the speed with which the divergence occurs, 
we enclose the attractor in a circle. Since attractors of 
IFS’s are bounded, it is always possible to find a circle C 
of a finite radius /£, completely enclosing A. The escape 
time of a point Q £ A is then defined as the length of 
the longest trajectory included in C : 

Ei(Q) = max{length(iy) : Tr(Q, w) C C}. 

According to this definition, the function E\{Q) is 
integer-valued. In order to represent the escape time 


The norm symbol \\Z\\ denotes the distance between 
point Z and the center O of circle C, thus \\Z\\ = 
d(Z y O). The function res (Z, a) has the following prop¬ 
erties (Figure 2): 

• it takes a nonzero value if point Z lies inside the 
circle C and its image Zoh{a) lies outside this circle; 

• it tends to 0 if the point Z approaches the boundary 
of circle C, and to 1 if the image Zoh(a) approaches 
this boundary. 
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Observe that, when the length of the longest trajectory 
included in C is incremented as a result of moving the 
starting point Q towards the attractor, the largest resid¬ 
ual term changes its value from 1 to 0. Consequently, 
Ei(Q) is a continuous function of the position of point 
Q in the domain X\.4. For a formal proof of this prop¬ 
erty see [11]- A justification of the choice of Formula 1 is 
given in the Appendix. 

The escape-time functions E\(Q) and P 2 (<5) are not de¬ 
fined inside the attractor A , as one can find there infinite 
sequences of points remaining in A and therefore remain¬ 
ing in the circle C. In order to make the definition of the 
escape time computationally effective, we evaluate the 
escape trajectories up to a predefined maximum length 
m. The escape-time functions, limited in this way, can 
be computed in the entire space X using the following 
formulae: 

Ri(Q,rn) = 

{ 0 if Q £ C or m = 0, 

1 + maxfF^Q o h(a), m — 1)} otherwise. 


E 2 (Q, m) = 

( 0 if Q & C or m = 0, 


I 


/ (n u »fQGC, m>0,and 

max{res(c?,a)} ^ r/ v w r n ^ \r 
aev 1 v n Q o h(a) g C for all a € V, 

1 + max{i£ 2 (<3 o A(a), m — 1)} otherwise. 

a£V 


It is intuitively clear that E_i(Q,rn) = E\ for all 
points Q with the escape time E\(Q) less than m, 
since the recursive formula evaluates step-by-step the 
same trajectories as its non-recursive counterpart. Sim¬ 
ilarly, P 2 (<3,m) = ^2 (Q) for all points Q such that 
i? 2 (Q) < Tn. Rigorous proofs of these equalities can be 
carried out by induction on m. 

Figure 3 visualizes the dragon curve from Figure 1 us¬ 
ing the continuous escape-time function 2? 2 (Q) m )- The 
inverse functions are: 



Figure 3: The dragon curve visualized using the 
escape-time method 


h form an “ordinary” IFS, and L C V* is a language over 
the alphabet V . 

Consider a starting point P that belongs to the attractor 
A of the IFS I, and let Al{P) denote the closure of the 
image of P with respect to the transformations h(L). 
The following inclusion holds: 

A l (P) = cl (P o h(L)) C cl(P O h(V*)) = A . 

Thus, the set Al(F) generated by the LRIFS 2x with 
the starting point P E A is a subset of the attractor A- 

For example, consider an LRIFS Tl = ( X } T, V, ft, L), 
where: 

• the space X is the plane, 

• the IFS T consists of four transformations: 

Pi = s( 0.5), 

F 2 = s { 0.5) o <(0,0.5), 

F 3 = 5(0.5) o r(45) o<(0,1), 

F a = 5(0.5) or(-45)o *(0,1), 


Ff 1 = r(—45) o s(V2), 

Ff 1 = <(0,-l)or(-135)os( v / 2). 

It is assumed that the circle C has radius R equal to 5, 
and the limit m is equal to 20. The values of function 
P 2 (Q> m ) are interpreted as a height field. 

5. Language-restricted IFS’s 

A language-restricted iterated function system (LRIFS) 
is a quintuplet Xi = (X, P, V, h, L), where X, P, V", and 


• the alphabet V consists of four letters a, 6, c, d> 

• the homomorphism h is defined by: 

h(a) = F l} *.(<0 = ^2, h(c) = F 3 , h(d) = F 4 , 

• the language L consists of words in which no letter 
c or d is followed by an a or b. 1 

1 Thus, L is defined by the regular expression: 

L = (oU f>)*(cU d)*. 
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Figure 4: Attractor A and its subset Al(P) 


Figure 4 compares the attractor A of the IFS X with the 
set Al (P) generated by the LRIFS Xl using the starting 
point P = (0,0). Clearly, the branching structure of 
Figure (b) is a subset of the original attractor (a). 

In general, the set Al(P) depends on the choice of the 
starting point P. Nevertheless, if the language L is pre¬ 
fix extensible (i.e., vL C L ), the smallest set At does 
exist and can be found as cl(Po °/i(L)), where Po is the 
invariant point of the transformation h(v). This results 
from the following inclusions, satisfied for any P £ X: 

d(P 0 o h(L)) = cl(( lim P o h{v n )) o h(L)) 

n —*oo 

= lim cl(P o h(v n L)) C cl(P o h{L)). 

n—*■00 x 

The limits are calculated in the space of all closed 
nonempty bounded subsets of the space X with the Haus- 
dorff metric [16]. By analogy with the “ordinary” IFS’s, 
we call Al the attractor of the LRIFS Xl . 


6. The escape-time method for LRIFS’s 

While extending the escape-time method to LRIFS’s, we 
consider mirror images of words and use the following 
lemma. 

Lemma. Consider IFS X = (X,,F, V ) A), and let all 
functions F = A(a) £ T be invertible. Then for any 
word w £ V*, the equality h(w) = holds. 

Proof. The set T forms a group of transformations 
with the operations of function composition and inver¬ 
sion, thus ( F{ oFj )- 1 = Fj~ l o pr 1 for any Fj, Fj £ T 
Consequently, the following equalities are true for any 
word w = aia 2 .. .a n £ K*: 

h(w) = A(aia 2 ...a n ) 

= ft(ai) o A(a 2 ) o ... o A(a n ) 

= (h(a l ))- l o(h(a 2 ))- 1 o...o(h(a n ))- 1 

= {h(a n )o...oh(a 2 )oh(ai ))~ 1 
= (h(a n ...a 2 a 1 ))~ 1 =(h(w R ))- 1 . □ 


The escape-time method for LRIFS’s is based on the fol¬ 
lowing extension of Theorem 1 from Section 4: 

Theorem 2. Consider an LRIFS Xl = (X, Ty V, A, L) y 
and assume that the language L is prefix extensible, 
vL C L. Denote by A the attractor of the IFS 
(X, T^ V y h) } and by Al the attractor of Xl- (a) If a 
starting point Q £ X belongs to the attractor Al i then 
for any n > 0 there exists a word w in the prefix closure 
V(L r ) such that length(w) > n and Tr(Q,u;) C A. 
(b) If the point Q does not belong to Al > all trajecto¬ 
ries Tr(Q,it;) with w £ V{L R ) diverge to infinity as 
length(tn) —* oo. 

Proof, (a) Let Pq denote the invariant point of the 
transformation h(v). According to the definition of the 
attractor Al> there exists a word y £ L such that 
Po 0 h(y) = Q. 2 Since P 0 = Pq o h(v)y the equality 
Pq o h(v l y) = Q holds for any i > 0. Let i satisfy the 
inequality length(i; l y) > n, and w = (v l y) R . The word 
w belongs to L R and henceforth to V(L R )y has length 
greater than or equal to n, and maps point Q to the point 

PqEA L : 

Q o h(w) = Q o (hty'y))- 1 = PqEAl- 

In order to show that the entire trajectory Tr(Q,u;) is 
included in the attractor Ay let us consider an arbitrary 
partition of the word w into a prefix X\ and a suffix £ 2 ; 
thus X 1 X 2 = w. From the equality 

Q o h(w) = Qo h(xi) o A(x 2 ) = Po 

it follows that 

Q O h(xy) = Po O (A(i 2 )) -1 

= Po O h(x R ) 6 Po o h(V*) C A. 

Since this argument holds for any x -< w, we obtain: 
Tr(<3, w) = {Qo h(x ) \ x ^ w) C A. 

(b) Let C be an arbitrary circle enclosing the attractor A t 
and R denote the radius of C. We have to prove that if 
Q £ Al, there exists a number n > 0 such that for any 
word w £ V(L R ) of length greater then or equal to n, 
the escape trajectory Tr(Q, u;) is not entirely included 
in C. Let D denote the distance between point Q and 
the attractor Al » and r max be the largest Lipschitz con¬ 
stant found among the transformations F £ T, Since 
D > 0 and r max < 1, there exists a number n > 0 
such that 2Rr^ ax < D. Consider an arbitrary word 
w £ V{L R ) with length(tt;) > n, and let wy £ L R y or 

2 Strictly speaking, there exists a word y € L such that 
Po 0 M$/) i s arbitrarily close to Q. 
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y 11 w R G L. Then there exist points pQ y P G Al such 
that Poohfj^w**) = P. We decompose the last equality 
by introducing an intermediate point P f : 

Po o h(y R ) = P' and P' o h(w R ) = P . 

It follows that 

P o h{w) =P' = P 0 o h(y R ) G Po o h(V*) C A. 

The distance between points P and Q is at least JD, and 
the Lipschitz constant of the composite transformation 
A(tt>) = (h(w R ))~ x is at least r“” x , thus 

d(QoKw) t Poh(w))>d{Q 9 P)r-l, 

> Dr- n ax > 2 R. 

Since P o h(w) = P* G A C C, and the distance of 
Q o h(w) from P* is greater than the diameter of C, the 
point Q o h(w) must lie outside of C, or 


RL2(Q> K > m ) = 
f 0 if Q & C or m = 0, 


I max{res(Q,a)} 


if Q G C, m > 0,and 
Q o h(a) £ C for all a G K t 


l + mjtx{E L2 (Qoh(a),I<//a,m- 1)} 

afc/t 

otherwise. 


These formulae can be used for any language K = 
V(L R )t provided that L has the prefix property, as as¬ 
sumed in Theorem 2. The required operations on lan¬ 
guages are particularly simple if L is regular. It can be 
then specified using a finite-state automaton, which re¬ 
duces operations on infinite languages to the operations 
on their finite representations. Details are given in the 
following section. 


7. The application of finite automata 


Tr {Q,w)<ZC. □ 

Theorem 2 reveals an analogy between the escape trajec¬ 
tories of an LRIFS and an ordinary IFS. In both cases we 
find infinitely long trajectories confined to A if the start¬ 
ing point Q belongs to the attractor — respectively Al 
or A. For a point Q outside an attractor, all trajectories 
diverge to infinity as their length increases. However, in 
the case of an ordinary IFS we consider escape trajecto¬ 
ries with respect to all possible words w G V* 9 while in 
the case of an LRIFS the words w are confined to the 
prefix closure K = P(L R ). 

As a result of these observations, we can extend the 
escape-time formulae from Section 4 to LRIFS’s as fol¬ 
lows: 

E Ll (Q) = max{length(w) : Tt(Q, w) C C} y 

w£K 

EL2(Q) = 

max {length(u>) + res(C? o h(w), a) : Tr(Q, w) C C). 

wa£K 

In the recursive counterparts of these functions, the key 
issue is the selection of mappings h(a) that can be ap¬ 
plied in each step. We use the derivatives of the language 
K to find the appropriate letters a at each level of recur¬ 
sion. As previously, m limits the recursion depth. 

E l 1 (Q,I<, m) = 

0 if Q C or m = 0, 

< 1 + max{i?£ 1 (Q o h(a), K//a, m — 1)} 

a£K 

otherwise. 


We start by recalling the necessary notions of the theory 
of finite automata. For the original presentation see [19]. 

A nondeterministic finite^state (Rabin-Scott) automaton 
is a quintuplet: 

M = < VyS.SO.Tyl >, 

where: 

• V is an alphabet, 

• S' is a finite set of states, 

• So € 5 is a distinguished element of S', called the 
initial state, 

• T C S is a distinguished subset of S, called the set 
of final states, 

• I C V x S x S is a state transition relation. 

We often write (a, 5,*) —► s* instead of (a, s t *, s*) G /. 

Finite state automata are commonly represented as di¬ 
rected graphs, with the nodes corresponding to states, 
and arcs representing transitions. The initial state is 
pointed to by a short arrow. The final states are dis¬ 
tinguished by double circles. 

A word w = aia2...a n € V* is accepted by the 
automaton M if there exists a sequence of states 
S0 i5i, 5 2 ,5 n _ i G S and s n G T such that: 

(°1> 5 o) * $i, (d2j5i) * 52, * * * , (fln, 5 n _i) ► S n . 

Thus, w is accepted by M if there exists a directed path 
in the graph of M starting in the initial state Sq, ending 
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in some final state s n , and labeled with the consecutive 
letters of w. The set of all words accepted by an au¬ 
tomaton M. is called the language accepted by M, and 
denoted by L(A4). 

It is known that the mirror image of the language L(M) 
is accepted by the automaton 

M n = (V,SL>{s' 0 },s' 0 ,{so},I R ), 

where I R = 

{(e.So.s*) : s k € T) U {(a, «;,«,•) : (a.s,-,^) e I). 
Thus, the automaton M n is obtained from M. by: 

• creating a new initial state Sq $ S', 

• creating transitions labeled c from s f 0 to all final 
states of M , 

• reversing the directions of all other transitions, 

• making So the unique final state of . 

Given an automaton M defining a language L , the pre¬ 
fix closure V{L R ) is accepted by the automaton 
obtained from A by making all its states final. 

Consider an LRIFS X = {X, T y V, /i, L), where L is ac¬ 
cepted by a given finite automaton M. Using the method 
given above, we can construct the automaton = 

(V, 5, So, 5, 1) that accepts the language K = V(L R ). 
A word w belongs to K if and only if there exists a path 
in starting in So and labeled with the consecu¬ 

tive letters of w. Thus, the recursive computation of the 
derivatives of AT, needed to evaluate functions En and 
&L2\ can be replaced by the recursive construction of 
paths in M. R7> , starting is So. This leads to the follow¬ 
ing recursive definitions: 


E.Mi(Q<Si,m ) = 

0 if Q £ C or m = 0, 

1+ max {E_ Mi (Q oh(a),s i< m ~ 1)} 

otherwise. 

KM2(Q,Si,m) - 

0 if Q C or m = 0, 

if Q € C, m > 0, and 
max {res«),a)} Qoh(a)$C 
i a ’’"" 3 g/ for all (a, s,-,Sj) 6 I, 

l + . max {F M 2 (goA( a ), 5 j ,m-l)} 

k otherwise. 

The evaluation of functions Km\ and Km2 sta rts with 
S; = So. The equivalence of the formulae for Kli anc * 
Km 1 > 35 we ^ 35 Kl2 aR d Km2* cajl be proved by induc¬ 
tion on the maximum path length in . 

Example 1. The following LRIFS X x = (X^uVu 
h\ y L\) was described by Berstel and Abdallah [6]. It is 
assumed that: 

• X is the plane, 

• T\ consists of four transformations: 

F\ = s(0.5) o*(0.0,0.5), 

F 2 = s(0.5) o*(0.5, 0.5), 

F 3 = s(0.5), 

F 4 = s(0.5)of(0.5,0.0), 

• V i = {F l} F 2 ,F 3y F 4 ], 

• hi(Fi) = Fi for i = 1,2,3,4. 
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Figure 6 : (a) The automaton M 2 defining the language L 2 , and (b) the attractor of the LRIFS I 2 


The language L\ is defined using the finite automaton 
M\ shown in Figure 5a, and the corresponding attrac¬ 
tor is given in Figure 5b. The automaton defining the 
language K\ = V(L^) is shown in Figure 5c, with the 
transitions labeled using the inverse transformations of 
T\. The label I indicates the identity transformation, 
associated with the e-transitions of M ^^. Plate 1 (left) 
visualizes the escape time function computed with a re¬ 
cursion depth limit m = 20 , using a bounding circle C 
with radius R = 5. The escape time values are inter¬ 
preted as indices to a color map, arbitrarily divided into 
several ramps. Plate 1 (right) presents the same function 
as a height field. 

Example 2. The LRIFS X 2 considered in this exam¬ 
ple was described by Vrscay [ 20 ], It uses the same set of 
transformations T and the labeling function h as X\ , but 
the language L 2 is different. The automaton M 2 defin¬ 
ing L 2 and the resulting attractor are shown in Figure 6 . 
The escape time function is presented in Plate 2 . 

Example 3. The LRIFS Z 3 , taken from [16], describes 
a leaf-like structure with the alternating and opposite 
branches. The set of transformations is specified below: 


Fi 

= 

s(0.5) 0 <(—0.002,0) 

f 2 

= 

s(0.5)o <(0.002,0) 

f 3 

— 

s( 0 . 5 )o <(-0.002,0.13) 

f 4 

= 

s(0.5)o <(0.002,0.13) 

f 5 

= 

s(0.42) 0 r(45) 

f 6 

ZZ 

s(0.2) 0 r(90) 0 <(-0.05,0.05) 

f 7 

= 

s(0.2)o <(-0.05,0.05) 

Fa 

= 

<(0.3, -0.3) o s(0.74) o <(-0.3,0.3) 

F 9 

= 

s(0.37) o r(—45) o <(0,0.14) 

Fio 

ZZ 

s(0.172) o r(—90) o <(0.05,0.19) 

Fi 1 

= 

s(0.172)o <(0.05,0.19) 

F 12 

= 

<(-0.265, -0.405) o s(0.74) o <(0.265,0.405) 

F 13 

rz 

<(0, —1) o s(0.74) o <(0,1) 


The automaton M 3 defining L 3 and the corresponding 
attractor are shown in Figure 7. The escape time func¬ 
tion is visualized in Plate 3. 

8. Conclusions 

This paper presents methods for computing the escape¬ 
time functions of language-restricted iterated function 
systems. The LRIFS’s generalize the ordinary IFS’s 
by imposing restrictions on the applicable sequences of 
transformations. The escape-time functions can be com¬ 
puted for any set of sequences constituting a prefix- 
extensible formal language L. The computation of the 
escape time involves finding the mirror image L R , deter¬ 
mining the prefix language K = P(L R ), and calculating 
its derivatives. These operations can be performed in 
a simple way if L is regular, using a specification of L 
by a finite automaton. All examples considered in this 
paper refer to this case. It is an open problem whether 
non-regular languages can yield other attractors and vi¬ 
sualizations. 

One could raise a question, whether this paper applies 
computer graphics to visualize an important mathemat¬ 
ical concept, or whether it merely employs mathemat¬ 
ics to create images for the sake of their visual appeal. 
Our motivation falls in both areas — we wanted to ex¬ 
tend the mathematical concept of escape-time functions 
to LRIFS’s, realizing that it is primarily used for image 
synthesis. In addition, we found that the well-established 
theory of automata and formal languages had unexpected 
applications in computer graphics. 
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Figure 7: (a) The automaton M 3 defining the language L 3 , and (b) the attractor of the LRIFS I 3 


Appendix: Justification of the logarithmic 
formula for res(Z, a). 


The continuity of the escape function can also be main¬ 
tained by residual terms other than that given by For¬ 
mula 1, for example: 


{ R-\\Z\\ if ZeC and 

\\Zoh(a)\\-\\Z\\ Zoh(a)tC, 

0 otherwise. 

In order to explain the advantages of Formula 1, let us 
consider an IFS consisting of a single complex function 
F(z) = z/c. By definition, F(z) is a contraction, thus 
|c| > 1 . Given a circle C with radius R and center O 
in the origin of the coordinate system, the integer-valued 
escape-time function E\(z) is equal to: 

Ei(z) = max{n 6 M : ||-2rc n || < R). 

The symbol M represents the set of natural numbers (in¬ 
cluding zero), and the module of a complex number is 
identified with its norm, |zc n | = ||2rc n ||. A continuous 
(and infinitely differentiable) extension of function E\{z) 
is: 

E2(z) = max{u G : ||zc u || < R ), 

where 7 ?. + is the set of non negative real numbers. Obvi¬ 
ously, the value £*2(2) satisfies the equation: 

\\zc E ^\\ = R. 


Consider point Z = zc E, ( z \ By representing £>2(2) as 
a sum E\{z) + res (Z) } we obtain 3 : 

\\ zc Ei(z)+res(Z)\\ _ ||£ C res ^)|| = R. 

Note that c = Zc/Z = F“ 1 (Z)/Z , and take logarithms 
of both sides of the previous equation: 

log \\Z\\ + res(Z)(log \\F~\Z)\\ - log ||Z||) = log R. 


Consequently, 


res(Z) = 


logfl-log||Z|| 

log||F-i(Z)||-log||Z|r 


Although the above reasoning applies to a particular IFS, 
it justifies the use of Function 1 also in other cases. In 
general, the distance between the origin of circle C and 
consecutive points in an escape trajectory tends to grow 
exponentially for large distance values. Consequently, 
Formula 1 minimizes first-order discontinuities in the 
escape-time function, yielding visually pleasing graphi¬ 
cal representations. 
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Plate 1 : The escape time function for the attractor of the LRIFS X \, shown using a color map and as a height field 



Plate 2 : The escape time function for the attractor of the LRIFS X 2 , shown using a color map and as a height field 



Plate 3: The escape time function for the attractor of the LRIFS J 3 , shown using a color map and as a height field 
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John C. Hart 
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Abstract 

The recurrent iterated function system and the L-system 
are two powerful linear fractal models. The main draw¬ 
back of recurrent iterated function systems is a difficulty 
in modeling whereas the main drawback of L-systems is 
inefficient geometry specification. Iterative and recursive 
structures extend the object instancing paradigm, allow¬ 
ing it to model linear fractals. Instancing models render 
faster and are more intuitive to the computer graphics 
com munity. 

A preliminary section briefly introduces the object in¬ 
stancing paradigm and illustrates its ability to model 
linear fractals. Two main sections summarize recurrent 
iterated function systems and L-systems, and provide 
methods with examples for converting such models to 
the object instancing paradigm. Finally, a short epi¬ 
logue describes a particular use of color in the instanc¬ 
ing paradigm and the conclusion outlines directions for 
further research. 

Keywords: Constructive Solid Geometry, L-system, 
Linear Fractal, Object Instancing, Recurrent Iterated 
Function System. 

1 Introduction 

Linear fractals are sets posessing some sort of self-affinity, 
with detail at all levels of magnification. Many of the ex¬ 
amples used to demonstrate the basic concepts of fractal 
geometry are linear fractals, such as those decorating the 
first part of [Mandelbrot, 1982]. Other accounts have re¬ 
ferred to linear fractals as M self-siinilar/self-affine sets” 
[Mandelbrot, 1982], “recurrent sets” [Dekking, 1982] and 
“graftals” [Smith, 1984]. 

The two most common linear fractal models are the 
recurrent iterated function system (RIFS) and the Lin- 
denmeyer parallel graph grammar (L-system). 

The iterated function system (IFS) model was used to 
investigate the use of linear fractals in image synthesis 
in [Demko et a/., 1985; Barnsley et a/., 1988]. Their 
rendering method used, often quite sophisticated, point 
clouds. An extension to the IFS model has been intro¬ 


duced and examined in various forms, under the adjec¬ 
tives “recurrent” [Dekking, 1982; Barnsley et a/., 1989; 
Cabrelli et a/., 1991], “Markov” [Womack, 1989], “con¬ 
trolled” [Prusinkiewicz & Lindenmayer, 1990], “language 
restricted” [Prusinkiewicz &; Hammel, 1991] and “hierar¬ 
chical” [Peitgen et a/., 1991]. Subtle differences in nomen¬ 
clature and form have caused this heterogeneous develop¬ 
ment. Moreover, modeling objects using the RIFS model 
is difficult, and each of the above variations on the RIFS 
theme have distinct advantages for this task. 

L-systems are parallel-graph grammars. Many have ex¬ 
plored their applications in natural image synthesis, e.g. 
[Smith, 1984; Prusinkiewicz et a/., 1988; Prusinkiewicz 
&; Lindenmayer, 1990], usually via the turtle graphics 
paradigm [Abelson &: diSessa, 1982]. The L-system model 
provides a straightforward and powerful technique for 
modeling natural objects, though the geometry it pro¬ 
duces using the turtle paradigm is not organized effi¬ 
ciently for rendering. 

Both models, in many cases, can be simulated using 
the object instancing paradigm, which makes them more 
efficient for rendering. We begin with a review of the 
object instancing paradigm. Descriptions of the recurrent 
iterated function system and the L-system models follow, 
and each is completed with instructions for the translation 
of the model to the object instancing paradigm. 

2 The Object Instancing Paradigm 

Object instancing is a modeling technique that permits ef¬ 
ficient internal representations of redundant objects. Of¬ 
ten, complicated objects consist of many identical compo¬ 
nents. Object instancing capitalizes on the redundancy of 
these identical components. A good example of an ideal 
situation is seen in the animation “Megacycles” [Ama- 
natides &: Mitchell, 1989]. 

The object instancing paradigm consists of two basic 
kinds of objects: primitives and instances. 

A primitive is an indivisible rendering atom. 
Boundary-representation systems use polygons or spline 
surfaces as primitives whereas solid modeling systems use 
natural quadrics, planes, tori and superquadrics as prim¬ 
itives. 
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An instance consists of a pointer to another object (a 
primitive or another instance, called the “master” object) 
and an affine transformation. The instance reproduces 
the referenced object, deformed by its affine transforma¬ 
tion. [Sutherland, 1963]. 

An object may also be a CSG set-theoretic operation 
such as a union, intersection or difference, Their operands 
are pointers to other operations, and so, they too instance 
other objects. (For the purpose of this discussion, we will 
focus on the union operation.) 

The object instancing paradigm contains another dis¬ 
tinction: the global coordinate system where the final 
scene is assembled versus the local modeling coordinate 
systems where the canonical scene components are con¬ 
structed. A canonical object is a shape in its default size, 
proportions and location. Canonical objects do not ap¬ 
pear in the rendered scene. A final instancing operation 
is needed to transform the canonical objects from their 
local modeling-space coordinate systems into the global 
rendering-space coordinate system [Roth, 1982]. 

Primitives are canonical objects. The canonical sphere 
is the unit sphere centered at the origin; the canonical 
plane is the y = 0 plane intersecting the origin; the canon¬ 
ical cylinder is the cyclinder of unit radius and height 
centered and oriented along the interval [0,1] of the y- 
axis; the canonical cone is likewise defined, pointing in 
the positive y direction. One can render a canonical ob¬ 
ject in its default state by instancing it from canonical 
space into rendering space using the identity transforma¬ 
tion matrix. When modeling, instancing takes canoni¬ 
cal objects to canonical objects. Upon completion of the 
modeling process, an instance operation converts the final 
canonical object into a rendered object for image synthe¬ 
sis. 


2.1 Examples 

Figure 1 (left) shows an approximation to Sierpinski’s gas¬ 
ket by nine triangles. Figure 1 (right) shows a possible 
CSG instancing model with a tree topology. The circles 
denote CSG union operations of three instances. 



Figure 1: An approximation of Sierpinski’s gasket 
(left) modeled with a tree topology (right). 


In the absolute specification paradigm, each triangle is 
individually specified, commonly by its three endpoints. 
Further development of this tree-structured model of Sier¬ 


pinski’s gasket requires the specification of 3 n_1 individ¬ 
ual triangles, where n is the height of the tree. Hence, the 
size of an absolute specification model of a linear fractal 
grows exponentially as the detail increases. 

The subtrees of Figure 1 (right) are redundant. Us¬ 
ing the instancing paradigm, one creates the same gasket 
approximation, Figure 2 (left), using a graph topology, 
Figure 2 (right), with fewer nodes. Here, the small circles 
denote instancing operations. 



Figure 2: An approximation of Sierpinski’s gasket 
(left) modeled with a graph topology (right). 

The textual specification for such an instanced model 
might look like the following. 


canonical object level_l_up 
triangle 
scale: 0.5, 0.5 
translate: 0.0, 0.5 

} 

canonical object level_l_left { 
triangle 
scale: 0.5, 0.5 
translate: -0.433, -0.25 

> 

canonical object level.l_right { 
triangle 
scale: 0.5, 0.5 
translate: 0.433, -0.25 

> 

canonical object level_2_up { 

union: level.l_up, level.1.1eft, level.i.right 
scale: 0.5, 0.5 
translate: 0.0, 0.5 


canonical object level_2_left { 

union: level.l.up, level.l.left, level.l.right 
scale: 0.5, 0.5 
translate: -0.433, -0.25 


canonical object level_2.right { 

union: level.l.up, level.l.left, level.l.right 
scale: 0.5, 0.5 
translate: 0.433, -0.25 


object level.top { 
union: level.2.up» 

> 


level.2.1ef t, level.2_right 
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In the previous object instancing example there is no 
difference between a level two object such as level.2_up 
and a level one object such as level.l.up other than their 
name and the objects they instance. This is called itera¬ 
tive instancing and the size of its model grows linearly as 
the level of detail increases. 

Instancing graphs are usually acyclic. A directed cycle 
in an instancing graph means an object has somehow in¬ 
stanced itself, and if rendered, may appear infinitely many 
times in the scene. This is called recursive instancing. 

As shown in [Hart & DeFanti, 1991; Hart, 1991a], one 
can extend the object instancing paradigm to handle re¬ 
cursive instancing by culling the instancing process when 
a predefined global bounding volume become sufficiently 
small. This requires any instancing loop to be contractive, 
otherwise the instances would not converge. 

For example, consider the limit of the shape described 
in Figures 1 and 2. Allowing recursive instancing, the 
hierarchy becomes cyclic and describes a linear fractal, 
specifically Sierpinski’s gasket, in Figure 3. 


Figure 3: Sierpinski’s gasket (left) modeled with a 

cyclic graph topology (right). 

This linear fractal is specified, more tersely than before, 

with the following commands. 

canonical object up { 

union: up, left, right 
scale: 0.5, 0.5 
translate: 0.0, 0.5 

canonical object left { 

union: up, left, right 
scale: 0.5, 0.5 
translate: -0.433, -0.25 

} 

canonical object right { 

union: up, left, right 
scale: 0.5, 0.5 
translate: 0.433, -0.25 

} 


3 The RIFS Model 

A (hyperbolic) recurrent iterated function system 
{{wi}iLi,G) consists of a finite set of N affine contrac¬ 
tions t Vi, and an N- vertex weakly-connected digraph G. 
Each vertex of this digraph corresponds to a contrac¬ 
tion and each edge, an allowable contraction composition. 
Here, a digraph G is denoted by an ordered pair (G v , G c ). 
The vertex set G v is a set integers {1... N). The edge set 
G c is a sef. of ordered pairs (i,j), i,j £ G v . such that the 
ordered pair (i, j) £ G e represents a directed edge from 
vertex » to vertex j. 

3.1 The RIFS Attractor 

Of fundamental importance to the study of iterated func¬ 
tion systems as well as recurrent iterated function sys¬ 
tems is the existence of corresponding unique compact 
non-empty invariant limit sets called attractors. One may 
recall the definition of the attractor of an IFS {iui} is the 
unique solution to the recurrence equation 

A= (J Wi(A), ( 1 ) 

as originally shown in [Hutchinson, 1981]. The attractor 
of an RIFS ({w;}, G) is the union of the solution sets A; 
to the recurrence equation 

A ) = IJ v>i(Ai) (2) 

as shown in [Barnsley et a/., 1989]. 

The attractor of an RIFS ({u;j}£Li, G) is a subset of the 
attractor of the IFS Thus, an RIFS is a restric¬ 

tion of an IFS, which results in the elimination of certain 
sections of attractor of the IFS. One example of this is 
the following set of four contractions 

»«(*.»)= (3) 

taken over all combinations of signs. If G is a complete 
digraph of four vertices then the attractor of the RIFS 
({tn,},G) is the attractor of the IFS {wi} f namely the 
square [—1,1] x [—1,1]. However, if G contains all edges 
except those of the type («,»'), then the same map may 
not be applied twice in a row. The attractor of this RIFS, 
from [Cabrelli et a/., 1991], is the fractal pound sign shown 
in Figure 4. 




object top { 

union: up, left, right 


A cyclic instancing graph represents inhnitly high levels 
of detail. Hence, the size of a recursive instancing model 
remains constant as the level of detail increases. 


3.2 Modeling 

One models an object as a linear fractal by creating a 
RIFS whose attractor approximates the object within 
some degree of accuracy. Equation ( 1 ) suggests that ap¬ 
proximating an object with an IFS is as simple as finding 
a set of contractions that take the entire object to each 
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Figure 4: The fractal pound sign. 


of its components — the so-called “collage theorem” phi¬ 
losophy of modeling [Barnsley et a/., 1986]. 

Equation (2) suggests a similar property, though for an 
RIFS model, the object is simulated by finding contrac¬ 
tions that take parts of the object to smaller parts. In 
the previous example, the fractal pound sign consists of 
four images of three-fourths of the original pound sign. 
One can see the self-similarity of the fractal pound sign 
by comparing its first quadrant with the rest of the set. 

Ordinarily, this partial self-similarity is quite difficult 
to see in objects. The collage theorem philosophy of mod¬ 
eling requires a self-tiling, which can be easily visualized. 
Its recurrent form, from [Barnsley et ah, 1989], requires a 
partial self-tiling which is significantly more difficult and, 
at best, non-intuitive. Hence, the recurrent collage the¬ 
orem is a somewhat ineffective modeling tool for linear 
fractals, for which there are, unfortunately, few alterna¬ 
tives. 


3.3 Specification via Instancing 

The recursive object instancing paradigm is one alterna¬ 
tive modeling method to the collage theorem. The draw¬ 
back is one must still partially self-tile an object to create 
a linear fractal model of it. The benefit is the paradigm 
incorporates tools familiar to the computer graphics com¬ 
munity. 

Converting an RIFS to an instancing structure requires 
three simple steps. First N canonical instancing objects 
are constructed. Each affine contraction of the RIFS be¬ 
comes the affine transformation matrix of its correspond¬ 
ing instancing object. The control digraph of the RIFS 


dictates the masters of each instance. Finally, a rendered 
object instances the canonical instances along with an ini¬ 
tial bounding volume into rendering space. 

For example, the 3-D pound sign can be specified by 
the following commands. The canonical object’s names 
are abbreviated versions of the instance’s position, from 
lower-left-rear to upper-right-front. 


canonical object llr { 

union: Ilf, lrr, lrf, ulr, ulf, urr, urf 
scale: 0.5, 0.5, 0.5 
translate: -0.5, -0.5, -0.5 

} 

canonical object Ilf { 

union: llr, lrr, lrf, ulr, ulf, urr, urf 
scale: 0.5, 0.5, 0.5 
translate: -0.5, -0.5, 0.5 

} 


canonical object urf { 
union: llr, Ilf, lrr, lrf, ulr, ulf, urr 
scale: 0.5, 0.5, 0.5 
translate: 0.5, 0.5, 0.5 

} 

object pound.sign { 

union: llr, Ilf, lrr, lrf, ulr, ulf, urr, urf 

> 


The object this instancing text models is shown in Fig¬ 
ure 5. 



Figure 5: The 3-D fractal pound sign. 
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4 The L-System Model 

Whereas the RIFS resembles the finite state automaton, 
an L-system is a parallel graph grammar similar in de¬ 
sign to a context-free grammar (CFG). It consists of an 
alphabet, a set of symbols, an initial axiom and a set of 
productions. CFG productions are applied one at a time 
and one per step, usually to the leftmost symbol whereas 
L-system productions are applied in parallel — all sym¬ 
bols are replaced simultaneosly at each step. 

4,1 Turtle Geometry 

The words of an L-system are commonly interpreted, us¬ 
ing turtle geometry [Abelson & diSessa, 1982], as a graph¬ 
ical object. The alphabet for turtle control contains such 
symbols as F which draws a straight line, and + and — 
which turn the turtle left and right by some fixed angle. 
For example, if this fixed angle is 90°, then the word 

F + F-F-F+F-F-F + F-F-F + F-F (4) 

will draw the outline of a plus sign. In [Prusinkiewicz 

Lindenmayer, 1990], a complete parametric language 
is used to developed a sophisticated 3-D turtle graphics 
paradigm. 

Of particular utility are the symbols [ and ]. The left 
bracket pushes the current state of the turtle on a stack 
whereas the right bracket pops this stack, restoring the 
turtle to its previous state. This simulates branching 
structures, allowing the turtle to concentrate on an in¬ 
tricate branch without having to retrace its steps to get 
back to the base of the branch to draw the rest of the 
object. 

For example, the following parametric L-system gen¬ 
erates a ternary-branching tree, from Figure 2.8(a) in 
[Prusinkiewicz & Hammel, 1991], 


A — 

i(^) F( 50) [&(18.95) F(50) A] 



/(94.74) [&(18.95) F(50) A] 



/(132.63) [&(18.95) F(50) /t] 

(5) 

m - 

F(1.109f) 

(6) 

!(«0 - 

!(\/3u>) 

(7) 


where !(•) alters the width of lines (radius of cylinders) by 
the given factor and F( ) draws a line of the given length. 
The rotation symbol &(•) alters the pitch of the turtle; 
the symbol /(•) alters the roll. This L-system begins with 
the word !(1) .F(50) A . 

In the previous example, the shortest length of any 
branch is 50. We can convert this L-system from “enlon- 
gating nodes” to “deacreasing apices,” which makes the 
longest length of any branch 50. This produces the fol¬ 
lowing L-system, a recurrent string, which is equivalent 


to the previous one except for scale, 

A(w,l) - *(u,)F(l)t&(18.95)F(0/l(^. T: ^9)] 

/(94.74) [&(18.95) F(l) 

/(132.63) [&(18.95) F(l ) 

( 8 ) 

This L-system begins with the initial word 
!(1) F{ 50) >1(0.577,45.1). 

4.2 Specification via Instancing 

Conversion from an L-system to an object instancing 
structure is more difficult than converting to an RIFS. 
In [Prusinkiewicz & Lindenmayer, 1990], steps were de¬ 
scribed for converting an L-system model to an RIFS 
model. Hence, using this result with the last section, 
we could convert an L-system indirectly to an instancing 
structure. Instead, we will outline techniques for directly 
converting L-system models to object instancing models. 

In general, converting L-system productions to in¬ 
stances requires thorough knowledge of the current state 
of the turtle at each step in the production. With the 
exception of the left bracket, each of the symbols on the 
right-hand side of the productions affect the state of the 
turtle. Furthermore, the line-drawing symbols cause a ge¬ 
ometric addition to the scene as well as a change to the 
state of the turtle. 

Each line-drawing symbol, e.g. F, can be interpreted 
as an instance of some canonical 3-D line, such as a cylin¬ 
der. The cumulative effect of the symbols preceding the 
line-drawing symbol determine that line’s size, position 
and orientation. Any bracket delimited symbols can be 
ignored. The affine transformation matrix of the line- 
instance is the product of the transformation matrices 
associated with the symbols preceding the current line- 
drawing symbol. For example, a + corresponds to a ro¬ 
tation and an F (preceding the current symbols) corre¬ 
sponds to a translation. 

Symbols denoting other productions are particularly 
difficult. These symbols can be replaced by an equivalent 
sequence of symbols denoting the production’s cumulative 
affect on the turtle’s state. Assessment of this cumulative 
affect can be quite cumbersome, particularly when the 
production is recursive. 

Fortunately, in most botanical models, recursion is used 
to simulate branching patterns. Rather than having the 
turtle retracing its steps back to the branch root after 
drawing the branch, the branch production is bracketed. 
Hence, if a bracketed branch production symbol precedes 
the current line-drawing symbol, it can be ignored and its 
cumulatice affect on the turtle’s state need not be com¬ 
puted. 
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The following instancing structure approximates the L- 
systera (8). 

canonical object trunk { 
cylinder 

canonical object A { 

union: trunk, branch.l, branch_2, branch_3 
scale: 0.902, 0.902, 0.902 
translate: 0, 1, 0 

> 

canonical object branchy1 { 
union: trunk, A 
rotate: x,18.95 
translate: 0, 1, 0 

> 

canonical object branch_2 { 
union: trunk, A 
rotate: x, 18.95 
rotate: y, 94.74 
translate: 0, 1, 0 

> 

canonical object branch_3 { 
union: trunk, A 
rotate: x, 18.95 
rotate: y, 94.74 
rotate: y, 132.63 
translate: 0, 1, 0 

> 

object tree { 
union: trunk, A 

} 

This approximation of the L-system suffers from two 
major shortcomings. 

First, the branch width changing factor is not repre¬ 
sented in this model; the scaling transformation in the 
object named A is uniform. One might be tempted to 
use the scaling command: “scale: 0.577, 0.902, 0.577, w 
which would produce a properly proportioned branch seg¬ 
ment, but the rest of its sub-branches would be artificially 
skewed along the original branch’s major axis. 

Second, this L-system will be evaluated to the pixel 
level, producing very fine branches but no leaves. A 
more realistic model would terminate well before the limit 
structure, allowing the addition of leaves to the ends of 
the branches. 

Both of these problems are solved by duplicating the 
above instancing text several times, into levels. This is 
equivalent to the use of conditions in a parametric L- 
system. The resulting instancing structure might look 
like the following, in a parametric instancing scheme. 

canonical object A(0) { 
union: leaf 
translate: 0, 1, 0 

> 


canonical object A(n) { 
union: trunk(ll-n), branch_l(n), 
branch_2(n), branch_3 (n) 
scale: 0.902, 0.902, 0.902 
^ translate: 0, 1, 0 

canonical object branch_l(n) { 
union: trunk(ll-n), A(n-l) 
rotate: x,18.95 
translate: 0, 1, 0 

> 

canonical object branch_2(n) { 
union: trunk(ll-n), A(n-1) 
rotate: x, 18.95 
rotate: y, 94.74 
^ translate: 0, 1, 0 

canonical object branch_3(n) { 
union: trunk(ll-n), A(n-l) 
rotate: x, 18.95 
rotate: y, 94.74 
rotate: y, 132.63 
translate: 0, 1, 0 

canonical object trunk(0) { 
instance: cylinder 


canonical object trunk(n) { 
instance: trunk(n-1) 

^ scale: 0.64, 1, 0.64 

object tree { 
union: trunk(0), A(10) 

In this case, the factor 0.64 unscales the radius of each 
branch by the value 0.902, then rescales it by the proper 
value 0.577 without affecting the geometry of its sub- 
branches. 

The above parametric instancing specification can be 
implemented for standard instancing interpreters by enu¬ 
merating each object for all possible values of n, incorpo¬ 
rating the value n into the name of the object. For exam¬ 
ple, instead of trunk(8) instancing trunk(7), the object 
trunk_level_8 would instance trunk_level_7. The lat¬ 
ter is significantly more verbose, but is still much less than 
absolute specification of every branch. In fact, paramet¬ 
ric instancing specifies in constant steps what standard 
instancing specifies in logarithimic steps — what abso¬ 
lute specification specifies in linear steps. 

A similar L-system was used to model the tree in Fig¬ 
ure 6 (compare the trees in [Kay & Kajiya, 1986]). The 
branches are iteratively instanced cylinders, but the leaves 
are modeled by a recursive instancing structure derived 
from the IFS leaf model shown in [Demko et a/., 1985]. 
The grass is modeled as iteratively instanced as in [Snyder 
& Barr, 1987], though the grass blades here are cones. 
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Figure 6: Autumn — an L-system tree with IFS 
leaves on instanced grass. 


5 Incorporating Color 

Color is incorporated into the linear fractal instancing 
model by associating a color and a weight with each in¬ 
stance. The color is specified by an RGB triple, the 
weight, by an alpha value. A current color and alpha 
value are maintained for the object during CSG hierar¬ 
chy traversal. During rendering, when an operand is in¬ 
stanced, the instance tints the color of its parent in the 
hierarchy by its own individual color. 

Formally, this tinting is based on the transfer functions 
from image compositing [Porter & Duff, 1984], 

c = c 0 + or»(l — a 0 )ci, (9) 

a = arc + ar,(l - ar 0 ), (10) 

where c, or are the resulting color and alpha values, c p , ot p 
are those of the parent, and Ci, on are the tinting values 
of the instance. 

The eight instances used to generate Figure 5 had the 
eight associated colors: black, red, green, yellow, blue, 
magenta, cyan and white, each with an alpha value of 
one-half. This coloring scheme was also used in [Hart & 
Das, 1991; Hart, 1991b] to better clarify otherwise imper¬ 
ceptible features of highly intricate linear fractal surfaces. 

The coloring of the elm trees in the Fractal Forest [Hart 

6 DeFanti, 1991; Hart, 1991a] was carefully adjusted so 
that the trunk and most of the visible branches were 
brown but the leaves (actually tiny branches) were green. 
Assigning the trunk a brown tint with a high alpha value 
and the branches a green color with a low alpha value 
achieved this goal. Thus, once a trunk instance was used, 


all of its decendents would be mostly brown whereas green 
points in the set arose from almost exclusive application 
of branch instances. 


6 Conclusion 

With a few enhancements, the standard instancing 
paradigm can be adapted to model linear fractals. This 
paper discussed methods for converting two popular lin¬ 
ear fractal models to the instancing paradigm, for effi¬ 
cient implementation in current computer graphics sys¬ 
tems. Several examples were given, with modeling code 
and resulting images. Additionally, the use of color in 
linear fractal models was documented. 


6.1 Further Research 

As they become better understood, linear fractals are 
becoming more popular for their applications in natural 
modeling and image synthesis. A few problems still re¬ 
main regarding their rendering, such as a rigorous treat¬ 
ment of the shading properties of a fractal surface [Hart, 
1991a]. 

Currently, the most popular open problem regarding 
the modeling of linear fractals is the algorithmic solu¬ 
tion to the inverse problem: given a shape, automatically 
determine the parameters of a linear fractal model that 
simulates the shape within a prescribed accuracy. One 
solution uses a gradient search in the RIFS parameter 
space [Vrscay & Roehrig, 1989], but this method is prone 
to local minima traps. Further enhancement using ge¬ 
netic programming techniques appear promising [Vrscay, 
1991], but much work remains along this specific course 
of attack. 

One of the immediate applications of an automated so¬ 
lution to the inverse problem is image compression. To¬ 
ward this end, block coding methods have been developed 
to solve this particular application of the inverse problem 
[Jacquin, 1991]. Nonetheless, image compression is only 
one tiny application of what could be a very powerful 
result. The ability to automatically model intricate 3-D 
structures with linear fractals would be a major result in 
computer graphics — its research should not end with 
one simple solution geared toward an image compression 
application. 

Finally, the classic object instancing paradigm is not 
fully suited to modeling many of the subtle enhancements 
to the turtle-geometric L-system model, such as tropism 
[Prusinkiewicz & Lindenmayer, 1990] or stochastic vari¬ 
ations [Kajiya, 1983; Bouville, 1985]. More research on 
extensions to the object instancing paradigm would allow 
more efficient modeling and rendering of more realistic 
botanical structures. 
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Abstract 

The problem of computing the intersection of parametric 
and algebraic curves arises in many applications of com¬ 
puter graphics, geometric and solid modeling. Earlier al¬ 
gorithms are based on techniques from Elimination theory 
or subdivision and iteration. The former is however, re¬ 
stricted to low degree curves. This is mainly due to issues 
of efficiency and numerical stability. In this paper we use 
Elimination theory and express the resultant of the equa¬ 
tions of intersection as a matrix determinant. The matrix 
itself rather than its symbolic determinant, a polynomial, 
is used as the representation. The algorithm for inter¬ 
section corresponds to substituting the other equation to 
construct an equivalent matrix such that the intersection 
points can be extracted from the eigenvalues and eigenvec¬ 
tors of the latter. Moreover, the algebraic and geometric 
multiplicities of the eigenvalues give us information about 
the intersection (multiplicity, tangential intersection etc.). 
As a result we are able to accurately compute higher order 
intersections. The main advantage of this approach lies in 
its efficiency and robustness. Moreover, the numerical ac¬ 
curacy of these operations is well understood. For almost 
all cases we are able to compute accurate answers in 64 
bit IEEE floating point arithmetic. 

Keywords: Intersection, Curves, Eigenvalues, Accuracy, 
Robustness. 

1. Introduction 

The problems of computing the intersection of parametric 
and algebraic curves are fundamental to geometric and 
solid modeling. Parametric curves, like B-splines and 
Bezier curves, are extensively used in the modeling sys¬ 
tems and algebraic plane curves are becoming popular as 
well [Hof89, MM89, SP86, Sed89]. Intersection is a primi¬ 
tive operation in the computation of a boundary represen¬ 
tation from a CSG (constructive solid geometry) model in 
a CAD system. Other applications of intersection include 


hidden curve removal for free form surfaces, [EC90]. Al¬ 
gorithms for computing the intersection of these curves 
have been extensively studied in the literature. 

As far as computing the intersection of rational para¬ 
metric curves is concerned, algorithms based on implici- 
tization [Sed83], Bezier subdivision [LR80] and interval 
arithmetic [KM83] are well known. The implicitization 
approach computes the implicit form of one of the curve 
using resultants [Sed83]. Given the implicit represen¬ 
tation of one curve, substitute the second parametriza- 
tion and obtain a univariate polynomial in its parameter. 
The problem of intersection corresponds to computing the 
roots of the resulting polynomial. The Bezier subdivision 
relies on the convex hull property of Bezier curves and de 
Casteljau algorithm for subdividing Bezier curves. The 
resulting algorithm performs a linearly converging binary 
search. It has been improved by more effective use of the 
convex hull property [SWZ89]. The interval arithmetic 
uses an idea similar to subdivision. Each curve is prepro¬ 
cessed to determine its vertical and horizontal tangents, 
and the curve is divided into ‘intervals’ which have vertical 
or horizontal tangents only at the endpoints. 

The relative performance and accuracy of these al¬ 
gorithms is highlighted in [SP86]. In particular, implicit¬ 
ization based approaches are faster as compared to other 
methods for curves of degree up to four. However, their 
relative performance degrades for higher degree curves. 
This is mainly due to issues of numerical stability and their 
effect on the choice of representation and algorithms for 
root finding. For curves of degree greater than three, the 
resulting univariate polynomial has degree 16 or higher. 
The problem of computing real roots of such high degree 
polynomials can be ill-conditioned [Wil59]. Using exact 
arithmetic or symbolic techniques to overcome the numer¬ 
ical problems have a considerable effect on the efficiency of 
the problem and therefore, subdivision based algorithms 
perform better. 

The algorithms for algebraic curve intersection are 
analogous to those of intersecting parametric curves. Re- 
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sultants can be used to eliminate one variable from the 
two equations corresponding to the curves. The problem 
of intersection corresponds to computing roots of the re¬ 
sulting univariate polynomial. This approach causes nu¬ 
merical problems for higher degree curves (greater than 
four). A robust algorithm based on subdivision has been 
presented in [Sed89]. However, resultant based algorithms 
are considered to be the fastest for lower degree curves. 

In many applications, the intersection may be of 
higher order involving tangencies and singular points. 
Such instance are rather common in industrial applica¬ 
tions [MM89]. Most algorithms require special handling 
for tangencies and thereby requiring additional compu¬ 
tation for detecting them. In fact algorithms based on 
subdivision and Newton-type techniques often fail to ac¬ 
curately compute the intersections in such cases. Special 
techniques for computing first order tangential contacts of 
parametric curves are given in [MM89]. [Sed89] presents 
modification of his algorithm for computing all double 
points of an algebraic curve in a triangular domain. How¬ 
ever, no efficient and accurate techniques are known for 
computing higher order intersections for general cases. 

In this paper we present efficient and robust algo¬ 
rithms for intersecting parametric and algebraic curves. 
For parametric curves we implicitize one of the curves 
and represent the implicit form as a matrix determinant . 
However, we do not compute the symbolic determinant 
and express the implicit formulation as a matrix. The 
idea of matrix determinant has been used in [MC91] to 
represent and evaluate the intersection of rational para¬ 
metric surfaces. Given the implicit form, we substitute 
the other parametrization into the matrix formulation and 
use the resulting matrix to construct a numerical matrix 
such that the intersection points can be computed from 
its eigendecompositon. This is in contrast with expand¬ 
ing the symbolic determinant and finding the roots of the 
resulting polynomial. The advantages of this technique 
lie in efficiency, robustness and numerical accuracy . The 
algorithms for computing eigenvalues and eigenvectors of 
a matrix are backward stable and fast implementations are 
available as part of packages like EISPACK and LAPACK 
[GL89, ABB + 90]. Furthermore, we effectively use the al¬ 
gebraic and geometric multiplicities of the eigenvalues to 
determine the exact multiplicity of the intersection. The 
algorithm for intersecting algebraic curves is rather simi¬ 
lar, except the relationship between algebraic and geomet¬ 
ric multiplicities of the eigenvalue and the multiplicity of 
intersection is different. 

The rest of the paper is organized in the following 
manner. In section 2 we present our notation and re¬ 
view techniques from Elimination theory for implicitizing 
parametric curves. Furthermore, we show that the prob¬ 
lems of intersecting parametric and algebraic curves can 
be reduced to computing roots of polynomials expressed 
as matrix determinants. In section 3, we present results 
from linear algebra and numerical analysis being used in 
the algorithm. Section 4 deals with reducing the prob¬ 
lem of root finding to computing the eigendecomposition. 


Given the eigenvalues and eigenvectors, we compute the 
intersection points of parametric curves in the domain of 
interest. We also discuss the performance and robustness 
of the resulting algorithm. Section 5 deals with higher 
order intersections and illustrates the technique with ex¬ 
amples. 


2. Parametric and Algebraic Curves 

A rational Bezier curve is of the form [BBB87]: 


P (<) = (*{<), K(t)) = 


£? =0 WiBi, n {t) * 


0 < t < 1 


where Pi = (Xj, Yi) are the coordinates of a control point, 
Wi is the weight of the control point and f?j fn (t) corre¬ 
sponds to the Bernstein polynomial, Bi,n 


t)*™ 1 **. Rational curves like B-splines can be converted 
into a series of Bezier curves by knot insertion algo¬ 
rithms [BBB87]. Thus, the problem of intersecting ra¬ 
tional curves can be reduced to intersecting Bezier curves. 
Each of these curves is described by its corresponding con¬ 
trol polygon and the curve is always contained in the con¬ 
vex hull of the control points. Therefore, the intersection 
of the convex hull of two such curves is a necessary condi¬ 
tion for the intersection of curves. One such instance has 
been highlighted in Fig. I. 

Algebraic plane curves are generally expressed in 
standard power basis: 


F{x,y) = = 0. 


They can also be represented in Bernstein basis. The 
problem of intersection corresponds to computing the 
common points on such curves in a particular domain. 



Intersection of Bezier curves 


2.1. Resultants 

Given two polynomials in one unknown, their resultant is 
a polynomial in their coefficients. Moreover, the vanish¬ 
ing of the resultant is a necessary and sufficient condition 
for the two polynomials to have a common root. Three 
methods are known in the literature for computing the 
resultant, owing to Sylvester, Bezout and Cayley [Sal85]. 
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Each of them expresses the resultant as determinant of a 
matrix. The order of the matrix is different for different 
methods. We use Cayley’s formulation as it results in a 
matrix of minimum order. 

Given two polynomials, F(x) and G(x) of degree m 
and 7i, respectively. Without loss of generality we assume 
that m > n. Let’s consider the bivariate polynomial 

= 

x — a 

P{x , a) is a polynomial of degree m - 1 in i and also in 
or. Let us represent it as 

P(x,a) = P 0 (x) + P,(a:)a + P 2 (x)a 2 +... + , 

where P t (x) is a polynomial of degree m — 1 in x. The 
polynomials Pi(x) can be written as follows: 


from Bezier to power basis can introduce numerical er¬ 
rors [FR87]. To circumvent this problem we perform a 
reparametrization. Given 

p/j\ fEj=oWiXi £)Lq Wj Yj Bj >w ( t) \ 

\ WiBi t n (£) J 


Dividing by (1 —t) n and substituting s = results in 


P (s) = 


( Er=o wiXi yrjs' ^ ^ a*} 

v E -° w ’(") 5 ’ j 


The rest of the algorithm proceeds by computing the 
implicit representation of P($) and computing a matrix 
formulation by Cayley’s method as 


Po(x) \ 


/ A.0 

Po,l . 

Po,m-l > 


i \ 

«(*) 

__ 

Pi,0 

Pi, 1 

P 1 ,m-] 


X 

* 2 

\P m -i(x)J 


V m-1,0 

Pm — 1,1 

• • Pm — 1 ,m—1 / 


V m -'J 


Let us denote the m x ro matrix by M. The determinant 
of M is the resultant of F(z) and G(x) [Sal85]. Let us 
assume that x = xo is a common root of the two poly¬ 
nomials. Therefore, / > (xo>o r ) = 0 for all a. As a result 
Pi(x o) =0 for 0 < i < m. This condition corresponds to 
the fact that M is singular and [1 xq Xq ... is a 

vector in the kernel of M . 

We use Cayley’s resultant formulation for implicit- 
izing parametric curves and eliminating a variable from 
a pair of algebraic equations representing algebraic plane 
curves. More details on the properties of implicit repre¬ 
sentation, computation and accuracy are given in [MD92]. 

2.2. Implicitizing Parametric Curves 

Given a rational Bezier curve, P(t): 

p (0= 



1 ^ 


(1 \ 


3 


<(1 - t) m " 2 

M 

S 2 

= M 

t 2 (] - f) m " 3 


j 


r - 1 / 


The right hand side is obtained by substituting s = 
(fTTy and multiplying vector by (1 - t ) m ~ l . Later on, 
this relationship will be used to compute the inverse 
coordinates of the intersection points. 

2.3. Intersecting Parametric Curves 

Given two rational Bezier curves, P (t) and Q(u) of 
degree m and n respectively, the intersection algo¬ 
rithm proceeds by implicitizing P(/) and obtaining 
a m x m matrix M, w'hose entries are linear combi¬ 
nations of symbolic coefficients Y< W. The second 
parametrization Q(tz) = (x(u), y(u), w(u)) is substi¬ 
tuted into the matrix formulation. It results in a ma¬ 
trix polynomial M(u) such that each of its entry is a 
linear combination of x(u),y(u) and w(u). The inter¬ 
section points correspond to the roots of 


'«(*) y(*A = feo e" = oW ,y;g.,n(t)\ 

^ w ( t ) £<"=0 (0 / 


To implicitize the curve we consider the following system 
of equations 

Xw(t)-Wx(t) = 0 
Yw{t)-Wy(t) = 0. (1) 

Consider them as polynomials in t and X } Y t W are 
treated as symbolic coefficients. The implicit represen¬ 
tation corresponds to the resultant of (1) [Sed83]. 

We express the resultant as a matrix determinant. 
In this case the matrix has order n. Each entry of the 
matrix is of the form aoX -fai Y *f c *2 W, where ao, ai,Q 2 
are scalars and functions of the control points and weights 
used to represent the Bezier curve. The algorithm for com¬ 
puting the entries of the matrix assumes that the polyno¬ 
mial are expressed in power basis. However, converting 


Determinant (Af (u)) = 0. (3) 

2.4. Intersecting Algebraic Curves 

In this section we consider the intersection of two 
algebraic plane curves. They are represented as ze¬ 
ros of F(x i y) and G(x,y), polynomials of degree m 
and n, respectively. The polynomials may be rep¬ 
resented in power basis or Bernstein basis. Let the 
points of intersection in the domain of interest be 
(zi>yi)>" •,(&*>?*)' To simplify the problem we 
compute the projection of these points on the x- 
axis. Algebraically projection corresponds to comput¬ 
ing the resultant of F(x i y) and G(x, y) by treating 
them as polynomials in y and expressing the coeffi¬ 
cients as polynomials in x. 
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We compute the resultant using Cayley’s formu¬ 
lation. In case, the curves are expressed in Bernstein 
basis, we use the reparametrization highlighted in the 
previous section for implicitization. The resultant is 
expressed as a matrix determinant and each entry of 
the matrix is a polynomial in x. Let us denote the 
matrix by M(x). The problem of intersection corre¬ 
sponds to computing roots of Determinant(M(x)) = 
0. 

3. Matrix Computations 

In this section we present techniques from linear al¬ 
gebra and numerical analysis. We also highlight the 
numerical accuracy of the problems and the algorithm 
used to solve these problems in terms of their condi¬ 
tion numbers. 

3.1. Eigenvalues and Eigenvectors 

Given a n x n matrix A, its eigenvalues and eigenvec¬ 
tors are the solutions to the equation 

Ax = Ax, 

where A is the eigenvalue and x is the eigenvec¬ 
tor. The eigenvalues of a matrix are the roots of its 
characteristic polynomial determinant (A — AJ) = 0. 
An eigenvalue, A of multiplicity k corresponds to a 
root of multiplicity k of the characteristic polynomial. 
This multiplicity is also defined as the algebraic mul¬ 
tiplicity of the eigenvalue. Moreover, the dimension 
of the kernel of (A —A,*/) is the geometric multiplicity 
of A,. The geometric multiplicity is bounded by the 
algebraic multiplicity. 

Most eigendecomposition algorithms make use of 
the symmetric orthogonal transformations of the form 
A — QAQ _1 , where Q is any non-singular orthogo¬ 
nal n x n matrix. This transformation has the charac¬ 
teristic that the eigenvalues of A and A are identical. 
Furthermore, if y is an eigenvector of A , <2 -1 y is an 
eigenvector of A. The running time of the eigende¬ 
composition algorithms is 0(n 3 ). However, the con¬ 
stant in front of n 3 can be as high as 25 for computing 
all the eigenvalues and eigenvectors. 

3.2. Generalized Eigenvalue Problem 

Given nxn matrices, A and 5, the generalized eigen¬ 
value problem corresponds to solving 

Ax = A Bx. 

We represent this problem as eigenvalues of A — A B. 
The vectors x correspond to the eigenvectors of this 
equation. If B is non-singular and its condition num¬ 
ber (defined in the next section) is low, the problem 


can be reduced to an eigenvalue problem by multi¬ 
plying both sides of the equation by B~ l and thereby 
obtaining: 

B~'Ax = Ax. 

However, B may have a high condition number and 
such a reduction can cause numerical problems. Algo¬ 
rithms for the generalized eigenvalue problems apply 
orthogonal transformations to A and B. In particu¬ 
lar, we use the QZ algorithm for computing the eigen¬ 
values and eigenvectors for this problem [GL89]. Its 
running time is 0(n 3 ). However, the constant can be 
as high as 75. Generally, it is slower by a factor of 
2.5 to 3 as compared to QR algorithm for computing 
eigenvalues and eigenvectors of a matrix. 

3.3. Condition Numbers 

The condition number of a problem measures the sen¬ 
sitivity of a solution to small changes in the input. A 
problem is ill-conditioned if its condition number is 
large, and ill-posed if its condition number is infinite. 
These condition numbers are used to bound errors in 
computed solutions of numerical problems. More de¬ 
tails on condition numbers are given in [GL89, Wil65]. 
The implementations of these condition number com¬ 
putations are available as part of LAPACK [BDM89]. 

In our algorithm, we perform computations like 
matrix inverse and computing eigenvalues and eigen¬ 
vectors of a matrix. Therefore, we are concerned with 
the numerical accuracy of these operations. 

4. Reduction to Eigenvalue Problem 

In this section we consider the problem of intersecting 
parametric curves and reduce it computing an eigen¬ 
decomposition of a matrix. The same reduction is ap¬ 
plicable to the intersection of algebraic plane curves 
as explained in [MD92]. 

In section 2 we had reduced the problem of inter¬ 
secting parametric curves, P (t) and Q(u) of degree m 
and n, respectively, to finding roots of a matrix de¬ 
terminant as shown in (3). Each entry of the m x m 
matrix, M(u), is a linear combination of Bernstein 
polynomials of degree n in u. Let us represent it as a 
matrix polynomial 

M(u) = M n u n + Af n -itx n “ 1 (l — u) -f ... 4 - M 0 (l — «)”, 

where M* is a matrix with numeric entries. On divid¬ 
ing the equation by (1 — u) n we obtain a polynomial 
of the form 

Mn^rr^—T + —) n_1 + ... + M 0 . 

1 — u 1 — u 

Substitute s — yf— and the new polynomial is of the 
form 

M{s) = M n s n + M n - lS n -'+ ... + M 0 . ( 4 ) 
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In the original problem we were interested in the 
roots of Determinant(M(u)) = 0 in the range [0,1]. 
However, after reparametrizing we want to compute 
the roots of Determinant(M(s)) = 0 in the range 
[0,oo]. This can result in overflow problems if the 
original system has a root u « 1. In such cases M n 
is nearly singular or ill-conditioned. Our algorithm 
takes care of such cases by linear transformations or 
using projective coordinates. 

Let us consider the case when M n is a well- 
conditioned matrix. Given M(s), we multiply the 
matrix polynomial by M” 1 . As a result we obtain 

M (*) = I m s n + Mn-is”- 1 + . . . + Mo, 

where I m is an m x m identity matrix and M,* = 
for all : < n. Given the matrix polynomial 

_ t 

M ($) we compute a nm x nm matrix of the form 



such that the eigenvalues of C correspond exactly to 
the roots of Det(M(s)) = 0. Furthermore C is a nu¬ 
meric matrix of order mn. The proof of this property 

is given as part of Theorem 1.1 [GLR82]. The re- 

_ # 

lationship between C and M (s) is identical to that 
between the characteristic polynomial of a matrix and 
its equivalent companion matrix [Wil65]. 

_ / 

Let A be an eigenvalue of C. As a result M (A) 

is a singular matrix. Let v = [vi v<i ... v m ] T be the 
_# 

vector in the kernel of M (A) such that 

M (A) (vi v 2 ... v m ) T = (0 0 ... 0) r . (6) 

Corollary I: The eigenvector of C correspond¬ 
ing to the eigenvalue A has the form V = 
[v Av A 2 v ... A n ~ 1 v] T , where V is a mn x 1 vector . 
Proof: [MD92]. 

It follows that the eigenvalues of C correspond ex¬ 
actly to the preimages of intersection points on Q(u). 
However, we are only interested in the eigenvalues in 
the range s 0 G [0,oo] and the preimages on the curve 
are obtained by substituting uq = This gives us 
a list of all the intersection points on <5j(uo) such that 
wo G [0,1]. However, these points on P(i) may not lie 
in the range t G [0,1]. As a result it is important for 
us to compute the preimage of the intersection point 
(^OiS/OjUJo) = Q(uo) with respect to P(t). We use 
the property of the linear system of equations (2) and 
Corollary I. 


Let us assume that (xo,2/o, w>o) is a simple point 
on P(2). Points of higher multiplicity are accounted 
for in the next section. Substitute for (X t Y, W) = 
(£ 0 ) 2 /OiU>o) in the matrix, M as shown in (2), corre¬ 
sponding to the implicit representation of P(f). The 
resulting matrix is singular and let us assume that 
its kernel has dimension one. Kernels of higher di¬ 
mension correspond to higher order intersection. The 
vector in the kernel corresponds to v shown in (6). 
Given the eigenvector of C corresponding to the eigen¬ 
value so, use Corollary I to compute the eigenvector 

v. Given v we use the structure of the linear system 
to compute the preimage of the point (zo>yo, u>o) by 
using the relation 

((l—<o ) m " 1 <o(l — *o) m 2 ... <0* —k{y\V2... Vmf, 

where k ^ 0 is a constant. Thus, to = ^ . The 

relationship between the eigenvalue so of C } elements 

vi, V 2 of the eigenvector V corresponding to so and 
the point of intersection (xo, J/ 0 i w>o) can be expressed 
as 

(so.yo.wo) = Q(tx~ ) = P( )• ( 7 ) 

1 + So V\ + V2 

As a result we are able to compute all the points of 
intersection in the domain of interest by computing 
the eigendecomposition of C. 

Let us consider the case, when the matrix M n 
in (4) is ill-conditioned. One such case occurs when 
the preimage of the point of intersection on P(*) is 
<o »1. As a result, computation of M~ x and the cor¬ 
responding reduction to the eigenvalue problem can 
introduce numerical problems. The general approach 
for solving such cases is the reduction to generalized 
eigenvalue problem. In this case we construct com¬ 
panion matrices of the form [GLR82, Section 7.2] 



such that the eigenvalues of C\S + C2 correspond ex¬ 
actly to the roots of Det(M(s)) = 0. Furthermore, the 
eigenvectors of this generalized system have a struc¬ 
ture similar to V highlighted in Corollary I and is 
used for computing the inverse image of the intersec¬ 
tion point with respect to P(t). 

Solving a generalized eigenvalue system is more 
expensive than the normal eigenvalue system (almost 
a factor of 3). In many cases, we can perform a linear 
transformation on the coordinate of the matrix poly¬ 
nomial and reduce the resulting problem to an eigen¬ 
value problem. The basic idea involves transforming 
s = JIJj, where a,6,c and d are random numbers. 

The matrix polynomial M{s) in (4) is transformed 
into 
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=► P(S) = Pn3 n + Pn-is”" 1 + . . . + AS + P„, 

where P{ s are computed from the Mj' s. If P n is 
a well-conditioned matrix then the problem of inter¬ 
section is reduced to an eigenvalue problem, otherwise 
use a different transformation (by a different choice of 
a,6,c and d). The linear transformation is performed 
up to four or five times. If all the resulting leading ma¬ 
trices, P„, are ill-conditioned, the intersection prob¬ 
lem is reduced to a generalized eigenvalue problem. 
There are cases when any linear transformation can 
result in an ill-conditioned leading matrix. Further¬ 
more, the domain of the eigenvalue system obtained 
after transformation is [si,^] or [$2>si] depending 
upon the signs of a,6,c and d, where s\ = — j and 



4.1. Implementation and Performance 

The reduction to an eigenvalue or a generalized eigen¬ 
value system involves estimating the condition num¬ 
ber of a matrix, computing the matrix inverse and 
finding the eigenvalues of a matrix. For eigenvalues 
lying in the domain of interest, we compute the corre¬ 
sponding eigenvectors. Furthermore, we also compute 
the condition number of each eigenvalue in the do¬ 
main of interest. The condition number is a function 
of the left and right eigenvectors of the matrix. 

We used LAPACK implementation of eigende- 
composition algorithms. Some of the routines were 
modified to compute the eigenvalues in the domain 
of interest. Furthermore, the domain was specified as 
a + j/?, where a > -e, \ji\ < e and j = y/—l. e is a 
small positive constant used to account for the numer¬ 
ical errors. In particular, we make e a function of the 
condition number of M n or P n for eigenvalue prob¬ 
lems. To compute the inverse coordinate of the inter¬ 
section point, the right eigenvector V corresponding 
to the eigenvalue $o is computed. Let 

v = [vi,i t> lj2 . . . Wi.m V 2 ,l • • • V 2 ,m • • • V„,1 . . . 


The performance of the algorithm is largely gov¬ 
erned by the eigen decomposition routines. Roughly 
80 — 85% of the time is spent in these routines. The 
eigenvalue algorithms compute all the eigenvalues of 
the given matrix. It is difficult to restrict them to 
computing eigenvalues in the domain of interest. The 
order of the matrix, say p, corresponds to the prod¬ 
uct of the degree of the two curves and the number of 
eigenvalues is equal to the order. The running time of 
the algorithm is a cubic function of p. The eigenvalue 
algorithms are iterative and have good convergence 
properties. It is a long observed fact that the algo¬ 
rithm requires two iterations per eigenvalue. As a re¬ 
sult it is possible to bound the actual running time of 
the eigenvalue computation by 10p 3 for most cases. 
Furthermore the eigendecomposition algorithms are 
backward stable. We have been to able accurately 
compute the intersections of curves of degree up to 
ten. In practice it is possible to obtain accurate so¬ 
lutions for matrices of order 100 or more. This is in 
contrast with computing roots of high degree univari¬ 
ate polynomials (which is an ill-conditioned problems) 
or using symbolic computation for determinant com¬ 
putation and finding the roots of the resulting poly¬ 
nomial expressed in Bernstein basis using subdivision 
and iteration (which is relatively expensive and has 
slow convergence). 


4*2. Example of Curve Intersection 

In this section we illustrate the algorithm by 
considering the intersection of two rational cu¬ 
bic Bezier curves. The example is taken from 
[Sed83]. The control points of two Bezier curves 
(as shown in Fig. II), expressed in homogeneous 
coordinates, are (4,1,1), (5,6,2), (5,0,2), (6,4,1) and 
(7,4,1), (1, 2,2), (9,2,2), (3,4,1). Thus, 


w ( w(t)’w{ty' 


Analysis of the accuracy of eigenvector computation 
indicates that each term of the eigenvector has a 
similar bound on the absolute error of the compu¬ 
tation. As a result we tend to use terms of maxi¬ 
mum magnitude to minimize the error in the compu¬ 
tation [MD92]. In this case we compute the entries of 
v = [v 1 v 2 ... v m ] T as: 


[vi v 2 ... v m ] T 


( [*>1,1 t>i,2 . . . Vl,m] T So < 1 

\ (■5o) n v n,2 • • * Sq > 1 


Given v the inverse coordinate, is computed using 
v \, v 2 or v m -i, v m by making use of similar numerical 
properties. 


where 

x(t) = 4(1 - t) 3 + 30(1 - tft + 30(1 - t)t 2 + 6t 3 

y(t) = (1 - t) 3 + 36(1 - t) 2 t + 4 1 3 

w(t) = (1 - e) 3 + 6(1 - t) 2 t + 6(1 - t)t 2 + < 3 . 

The implicit representation has a matrix determinant 
formulation given as 

— 114-f 30a? — 6y 30tl» — 6x — 6y — 10u; 4* 3x — 2y \ 
30 w-6x-6y 1070u/ - 213a; - 2y 96u/ - 12x - 6y j 

— lOu? + 3x — 2y 96u/ — 12a: — 6y — 120tx/ 4- 24x — 6yJ 
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Intersection of rational cubic Bezier curves 


The second parametrization, Q(u) is substituted 
into the matrix formulation (after a reparametrization 
of the form s = yf-j). The resulting matrix polyno¬ 
mial has the form 



The condition number of the leading matrix is 9.525. 

Multiplying M(s) with the inverse of the lead¬ 
ing matrix and constructing the equivalent compan¬ 
ion matrix results in 


C = 


( 0 0 0 

0 0 0 

0 0 0 

0 0 0 

0 0 0 

0 0 0 

1.48-1.07 0 
0.13 0.94 0 
\ —.07 .44 .33 


1 

0 

0 

0 

0 

0 

-11.92 


0 

1 

0 

0 

0 

0 

4.58 


0 

0 

1 

0 

0 

0 

0.40 


0 

0 

0 

0 

1 

0 


17.8-8.24 0.40 


—0.53 —11.92 —.22 1.06 11.4 -0.23 
.30 -.53 —9.16—.61 4.74 6.83 / 


The eigendecomposition of C results in 9 points of 
intersection as shown in the following table. 


eigenvectors we choose the elements v\ or vq depend¬ 
ing upon their relative magnitudes. The error bounds 
in the third column are obtained by using the con¬ 
dition number of the eigenvalues (as explained in 
[MD92]) and matrix norm as 

Ei = c || C || cond*, (8) 

where e =■ 2.2204e — 16 is the machine precision for 
64 bit IEEE floating point arithmetic and cond; is 
the condition number of the ith eigenvalue. As a re¬ 
sult, the eigendecomposition algorithms compute the 
eigenvalues of C up to 12 digits of accuracy. The 
other sources of error arise from the computation of 
the entries of M, the matrix corresponding to the 
implicit representation, and inverting the leading ma¬ 
trix of the matrix polynomial M(s). In our case, this 
can account for inaccuracy of one digit (due to con¬ 
dition number of the matrix to be inverted). As a 
result, the intersection points are computed up to 11 
digits of accuracy. Further accuracy can be obtained 
by using one or two iterations of Newton’s method 
on the equations used for representing the problem 
of intersection. The output of the algorithm are the 
starting points for Newton’s method. As a result the 
algorithm can be used to compute intersection points 
with high accuracy. 

5. Higher Order Intersection 

In the previous section, we presented an algorithm 
to compute the simple intersections of parametric 
curves. In this section we extend the analysis to 
higher order intersections. 

According to Bezout’s theorem two rational 
curves of degree m and n intersect in mn points 
(counted properly) [Wal50]. In our case, the preim¬ 
ages of these points correspond to the mn eigenvalues 


Intersection 

Number 

Eigenvalue 

50 

Max. Error 
Ei 

Parameter 

uq = r-p— 
l + sn 

Eigenvector’s 
a = tn|v8 

Eigenvector’s 

P = v 2 \v$ 

Parameter 

<0 = 

Point 

{X,Y) 

1. 

15.369 

2.32e-14 

0.9389 

0.2173 

0.0472 

0.1785 

(4.619,3.412) 

2. 

11.802 

2.85e-14 

0.9219 

0.6657 

0.4432 

0.3997 

(4.911,3.289) 

3. 

5.507 

2.71e-14 

0.8463 

0.0703 

1.000 

0.9343 


4. 

1.4654 

1.27e-13 

0.5944 

0.1614 

1.000 

0.8610 

IKtiK'liMMURIlMl 

5. 

0.5361 

2.32e-14 

0.3490 

1.00 

0.066 

0.0622 

(4.298,2.378) 

6. 

0.1534 

2.98e-14 

0.133 

1.00 

0.1233 

0.1099 

(4.455,2.971) 

7. 

0.0974 

2.38e-14 

0.0888 

1.00 

0.7277 

0.4212 

(4.931,3.218) 

8. 

1.145 

1.18e-13 

0.534 

1.00 

0.4644 

0.683 | 

(4.174,2.290) 

9. 

0.0382 

1.14e-14 

0.0369 

0179 

0.00032 

0.9823 



Eigendecomposition and intersection points 


The intersections points are computed using the 
relationship highlighted in (7). They are: In the 
columns corresponding to the components of the 


of C (5). Higher order intersections are the points 
having more than one preimage. In other words, 
eigenvalues of multiplicity greater than one corre- 
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spond to higher order intersection points. The in¬ 
tersection multiplicity of these points corresponds to 
the algebraic multiplicity of the corresponding eigen¬ 
value. These intersections arise due to the tangential 
intersection of the two curves at the point of contact 
or the intersection point is a singular point on at least 
one of the curves. Some higher order intersections are 
highlighted in Fig. III. They are: 

(a) Tangential intersection of two ellipses. The inter¬ 
section multiplicity is 2. 

(b) Second order intersection of a parabola with a 
curve having a loop. 

(c) Intersection of an ellipse and a curve with a cusp. 
The intersection multiplicity is 2. 

(d) Tangential intersection of a parabola with a loop. 
The intersection multiplicity is 3. 



(a) 



<b) 



An intersection point, P, has multiplicity fc, if a 
small perturbation in the coefficients of the curve (or 
the coefficients of the control polygon) results in k 
distinct intersection points in the neighborhood of P. 
Given two curves that intersect with multiplicity k at 
P y our algorithm computes the implicit representation 
and reduces the problem to an eigenvalue problem. 
The latter computation involves matrix inversion and 
multiplications using floating point operations. As a 
result, C in (5), corresponds to a slightly perturbed 
problem and k of its eigenvalues, say Ai, A 2 ,..., \k 
are very close to A, the eigenvalue corresponding to 
P (assuming exact arithmetic). The eigendecomposi- 
tion algorithms use floating point arithmetic and the 
computed eigenvalues correspond to Aj, A 2 ,..., A fc . 


However the problem of computing eigenvalues of 
multiplicity greater than one can be ill-conditioned 
[GL89, Wil65]. As a result A^ may agree with A up 
to a few digits of accuracy. This can be determined 
from the condition number of A t . In such cases the 
mean of A t ’s given by 

j __ Ai + A 2 -f ... + A fc 

A " k 

is much better conditioned. This can be verified by 
computing the condition number of a cluster of eigen¬ 
values [BDM89]. As a result A is very close to A, the 
eigenvalue of multiplicity k. The number k corre¬ 
sponds to the number of ill-conditioned eigenvalues, 
A t -, located in the small neighborhood of each other. 
We illustrate this technique using the following exam¬ 
ple. 

Example: Consider the intersection of cubic curve, 
P(£) = (x(t) y y(t)) = (t 2 — M 3 — t) with the parabola 
Q(u) = (x(u),y(u)) = (u 2 + u,u 2 — ti) (as shown in 
Fig. IV). The cubic curve has a loop at the origin. We 
are interesting in computing all the intersection points 
corresponding to the domain t x u = [—2,2] x [— 1,1]. 


Y 


X 


Fig. V 

Higher order intersection of a cubic curve and a 
parabola 

The implicit representation of P (t) is a matrix 
determinant of the form 

( —w — x —y w + x \ 

—y x 0 J . 

w + x 0 —w J 

After substituting and reducing the problem to an 
eigenvalue problem we obtain a 6 x 6 matrix 

0 \ 

0 

1 
0 
0 

-1 


c = 


I % 
0 

-1 

v:! 


1 

0 

0 

-1 


1 -2 -1 

1 -2 -2 
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The relevant eigenvalues of this matrix and their con¬ 
dition numbers are: 


Intersection 

Number 

Eigenvalue 

a; 

Cond. Number 
Condt 

1 . 

0.0 

1.2171 

2. 

-0.00000001296346 

4.86e09 

3. 

0.00000001296346 

4.86e09 


Eigenvalues corresponding to higher order 
intersections 


Thus, the second and third eigenvalues have a high 
condition number. Taking their average we obtain 
A =0.0, and it has a low condition number. As a 
result, we conclude that A = 0.0 is an eigenvalue of 
multiplicity 3 and the curves have a triple intersection 
at Q(0.0) = (0.0,0.0). 

Q.E.D. 

The procedure highlighted above is used for com¬ 
puting the intersection multiplicity of the point. How¬ 
ever, the intersection can arise from tangential inter¬ 
section, singular points or their combinations. The 
rest of the analysis deals with determining the nature 
of intersection. 

Given an eigenvalue, A, of algebraic multiplicity 
fc, its geometric multiplicity corresponds to the di¬ 
mension of the kernel of (C — A/). After accurately 
computing the algebraic multiplicity of the eigen¬ 
value, we compute its geometric multiplicity. Further¬ 
more, the algorithm computes a basis of the vectors 
spanning the kernel. Depending upon the nature of 
intersection the geometric multiplicity is less than or 
equal to algebraic multiplicity. The exact relationship 
between the multiplicities and nature of intersection 
has been described in [MD92]. Here we highlight the 
relationship for the examples in Fig. Ill and IV. We 
assume that the curves drawn in dark font are be¬ 
ing implicitized. The curves highlighted in light font 
are substituted into the implicit representation. The 
choice of curve for implicitization has an impact on 
the geometric multiplicities of the matrix, although 
the algebraic multiplicities are unaffected [MD92]. 


Example 

Figure 

Algebraic 

Multiplicity 

Geometric 

Multiplicity 

111(a) 

2 

1 { 

ni(b) 

2 

2 

III(c) 

2 

2 

ni(d) 

3 

2 

i IV 

3 

2 


Algebraic and geometric multiplicities of 
eigenvalues corresponding to Figs. Ill and IV 

In case the geometric multiplicity is 1, the computa¬ 
tion of the preimage corresponds exactly to the pro¬ 
cedure described in the previous section. If the ge¬ 


ometric multiplicity is greater than 1, the algorithm 
for preimage computation involves equation solving. 
Let us illustrate for Fig. V. 

In the example corresponding to Fig. V, the 
intersection multiplicity is 3. If we implicitize the 
parabola and substitute the cubic curve into the im¬ 
plicit form, the eigenvalue corresponding to the origin 
has algebraic multiplicity 3 and geometric multiplic¬ 
ity 1 . This is due to the fact that the parabola is 
intersecting the cubic curve tangentially at the loop, 
corresponding to the origin. The fact that the inter¬ 
section point is a loop implies that P (t) has two dis¬ 
tinct preimages t\ = 1 and t 2 = - l . As a result both 
the vectors vi = [1t?] T and v 2 = [1 t 2 t 2 ] T lie in the 
kernel of M after we substitute x = 0,y = 0,iy = 1. 
Since these vectors are linearly independent they con¬ 
stitute the basis of the eigenvectors corresponding to 
A = 0. However, the eigendecomposition algorithm 
can return any two linearly independent vectors of 
the form Vi = aiVi -f- a 2 v 2 and V 2 = /?ivi + /? 2 v 2 , 
where ot{ ’s and ftj ’s are scalars. The rest of the al¬ 
gorithm involves computing vi and v 2 from Vi and 
V 2 . This corresponds to computing the scalars and 
can be achieved by equation solving [MD92]. 

6, Conclusion 

In this paper we have highlighted a new technique 
for computing the intersection of parametric and al¬ 
gebraic curves. The algorithm involves use of re¬ 
sultants to represent the implicit representation of a 
parametric curve as a matrix determinant. The in¬ 
tersection problem is being reduced to an eigenvalue 
problem. The algorithm is very robust and can ac¬ 
curately compute the intersection points. There is a 
nice relationship between the algebraic and geomet¬ 
ric multiplicities of the eigenvalues and the order of 
intersection. We used this relationship in accurately 
computing such intersections. Efficient implementa¬ 
tions of eigenvalue routines are available as part of 
linear algebra packages and we used them in our im¬ 
plementations. 

The approach highlighted here is also useful for 
intersecting curves and surfaces. In particular, the 
implicit representation of a parametric surface can be 
represented as a matrix determinant [MC91]. The 
parametric space curve is substituted into the im¬ 
plicit formulation and the problem can therefore, be 
reduced to an eigenvalue problem. This can be di¬ 
rectly used for ray tracing parametric patches, as a 
ray corresponds to degree one parametric curve. 
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Abstract 

This paper describes a technique for computing the in¬ 
tersections of two parametric surfaces based on interval 
arithmetic. The algorithm, which can be stopped and 
restarted at any point, uses search techniques to refine its 
description of the intersections progressively. Interval 
arithmetic provides guaranteed points on the intersec¬ 
tion curves to within a user-specified tolerance. These 
points are connected into polygons and used to trian¬ 
gulate the trimmed surfaces. We provide details of an 
implementation and give examples of the algorithm’s 
use. 

Resume 

Cet article ddcrit une technique pour calculer les inter¬ 
sections de deux surface paramdtriques avec l’intervalle 
arithmdtique. L’algorithme utilise des techniques de 
recherche pour augmenter la precision des intersec¬ 
tions, et peut etre interrompu ou redemarrer h n’importe 
quel moment. L’intervalle arithmdtique nous donne des 
points garantis sur les courbes d’intersections. Nous 
erdons des polygons avec les points et triangulons les 
regions interieurs. Nous exposon en detail un moddle 
de mise en oeuvre et nous donnons des exemples de 
^utilisation de Palgorithme. 

Keywords: Surface Intersection, Interval Arithmetic 

1 Introduction 

Finding the intersection of two parametric surfaces 
is an important problem in Computer Aided Geomet¬ 
ric Design. It is useful in many applications such as 
trimming surfaces and performing boolean operations 
on boundary representation geometric models[17]. The 
difficulty of this problem forces solutions to trade gen¬ 
erality, robustness, and performance. 


Here, we present a method for intersecting parametric 
surfaces based on interval arithmetic. The method is 
very general, placing few restriction on the class of sur¬ 
faces it can handle. Nonetheless, the intersection points 
it finds are are guaranteed to be within a specified tol¬ 
erance. Since the method continually refines its results, 
a valid estimate of the intersection is always available 
during execution. As a consequence, the algorithm can 
be stopped when tolerance criteria or time bounds are 
met and restarted if the results are unacceptable. Ad¬ 
justing the tolerances makes it posible to trade accuracy 
or sampling rate for computation time. 

In sections two and three, we briefly review previous 
work on the intersection problem and describe the ba¬ 
sics of interval arithmetic. Then section four describes 
the algorithm for finding intersection points, formulat¬ 
ing the the task as a search problem. Finally, section 
five addresses the issue of linking the intersection points 
together and triangulating the bounded regions of param¬ 
eter space. Results from our prototype implementation 
are presented for a variety of shapes. 

2 Related Work 

Surface intersection problems have been widely stud¬ 
ied because of their practical importance (see [16] or [12] 
for a survey). The general problem is to find the set of 
points where two surfaces coincide in space. While two 
surfaces typically intersect at a set of space curves, the 
intersection may also contain distincL points or surface 
elements in degenerate cases. 

Parametric representations define surfaces by maps 
from the plane to three-dimensional Euclidean space. 
These representations are extremely popular because of 
their convenience for a variety of modeling and ren¬ 
dering purposes. Unfortunately, parametric surfaces are 
very difficult to intersect[16]. 

In general, exact analytical solutions for surface inter- 
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section problems are unavailable or impractical, since 
even simple surfaces can meet at very complicated 
curves[ll]. As a consequence practical solutions to 
the intersection problems must resort to approximating 
the solution. Following Barnhill et. al.[4], we charac¬ 
terize these approximations by tolerances which specify 
how closely the approximation must match the actual 
intersection. 

The literature on surface intersections contains a 
wide variety of approaches. The most common are 
marching and subdivision. Marching methods (e.g. 
[3, 14, 12, 5, 4, 6]), begin with points known to be 
at the intersection of the two surfaces and use numerical 
techniques to compute successive points on the intersec¬ 
tion curve. In addition to the numerical challenges of 
progressing around the curve, these techniques face the 
additional task of finding the initial points from which 
to begin marching. 

Subdivision is another approach to finding the inter¬ 
section of two surfaces. The basic idea (e.g. [16, 9,7]) 
is to divide the surface intersection problem into smaller 
pieces until each piece is a solvable problem. For exam¬ 
ple, Houghten et. al. [9] subdivide surfaces until each 
piece is nearly planar and use the fact that intersections 
of planar elements can be calculated directly. 

One difficulty with subdivision approaches is that they 
require a way of deciding whether the subproblems are 
adequately modeled by the solvable problems, for ex¬ 
ample deciding if a surface segment is nearly planar 
[9]. A second difficulty is that the results are often not 
guaranteed at all, or only guaranteed for a very restricted 
class of surfaces. For example, some powerful recent re¬ 
sults (e.g. [19,18]) apply only to polynomial or rational 
functions. Even if the results can be guaranteed, many 
subdivision techniques have poor performance [19] be¬ 
cause they need to search exhaustively for intersections. 

Our approach has several advantages over typical sub¬ 
division approaches. The use of interval arithmetic per¬ 
mits us to make guarantees about finding points on the 
intersection curve without placing severe restrictions on 
the class of surfaces thealgorithmcan handle. The para¬ 
metric mappings can be expressed in terms of trigono¬ 
metric functions, for example, with no particular diffi¬ 
culty. In addition, the subdivision strategy we use avoids 
exhaustive search in most cases. By adjusting the toler¬ 
ances used for stopping conditions, the tradeoff between 
time and accuracy can be user controlled. 

Interval arithmetic has been used for a variety of pur¬ 
poses in computer graphics[15, 13, 1, 20, 21]. Mudar 
and Koparkar [15] present the basic idea of using inter¬ 
val arithmetic to identify surface intersections but make 
no mention of the issues involved in creating efficient 


reliable algorithms which provide descriptions of inter¬ 
section curves within user specified tolerances. The 
work of Von Herzen and Barr[22] is very similar, using 
Lipschitz conditions to evaluate bounding regions for 
portions of surfaces. The Lipschitz conditions are de¬ 
rived by hand for each new analytic surface, unlike the 
automatic interval arithmetic used here. Von Herzen and 
Barr also do not address the issue of finding the intersec¬ 
tions themselves, instead relying on implicit functions 
for breaking objects into pieces. 


3 Interval Arithmetic 

Interval arithmetic is a method for providing a bound 
on the output of a function given bounds on all of its 
inputs. This section provides a brief introduction to in¬ 
terval arithmetic and its relevance to the surface/surface 
intersection problem. 

Interval arithmetic is based on the idea of extending 
ordinary scalar operations to intervals on the real line. 
If S’ is an interval, we can write it as (S min , S max ) to 
denote a quantity whose value lies somewhere between 
Smin and Smax- With every ordinary scalar function, 
say f(s,t), we associate an interval function F(S,T), 
which provides a bound on f(s,t) given bounds on s 
and t. We begin by defining the interval functions corre¬ 
sponding to primitive operations (e.g. basic arithmetic 
operations and trigonometric functions). For example, 
if f(s,t) = s -t- 1, we can define F(S,T) to be the in¬ 
terval (Smin + > Smax + T max ). Clearly, if s and 

t are within their bounds, their sum must lie in the inter¬ 
val F(S, T). Similar rules can be developed for a wide 
variety of elementary functions[2]. 

Once we have defined a set of interval functions cor¬ 
responding to primitive operations, we can create more 
complicated interval functions by composing them. The 
interval function corresponding to f(g(q, r), h(s, <)), for 
example, is simply F(G(Q, R) , H(S, T)). We have au¬ 
tomated this operation by defining a set of operations on 
intervals using the operator overloading capabilities of 
C++. 

Parametric surfaces are defined by mappings from 
(tx, v) to (*, y, z). If we use interval arithmetic to rep¬ 
resent the mapping, then we have an interval function 
which maps from(C/, V r )to(X,V', Z). The interval map¬ 
ping provides an axis-aligned bounding box in world 
space for every rectangular region of parameter space. 
The bounding box may not be a tight bound, but we are 
guaranteed that it contains the piece of the parametric 
surface defined by the rectangular region of parameter 
space. 
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4 An Interval Approach to Surface Inter* 
section 


Suppose that we have two parametric surfaces and 
their corresponding interval functions. If we pick a 
rectangle in each parameter space, the interval functions 
provide a pair of bounding boxes, one for each surface. 
We make use of the bounding boxes as follows. If 
the bounding boxes do not overlap, we know that the 
surfaces do not intersect in the corresponding parameter- 
space rectangles. In that case, we need not examine 
these regions of parameter space any further. If the 
bounding boxes do overlap, the surfaces might intersect 
in the corresponding regions of parameter space, but we 
cannot be sure that they do. To learn more, we can 
subdivide the parameter space. Suppose we subdivide 
until the bounding boxes all have diagonals smaller than 
t/2. Then if we find a pair of intersecting bounding 
boxes from the two surfaces, we can conclude that the 
surfaces approach each other to within a distance of e 
in the corresponding regions of parameter space. We 
refer the corresponding parameter-space regions in such 
a case as a “dot” and its “mate.” Each dot gives us a point 
on the intersection curve to within the dot tolerance e. 
The problem is to find an appropriate set of dots which 
can be linked together to form the trimming curve. 


4.1 A Simple Interval Intersection Method 

One way to find an appropriate set of dots is to divide 
each parameter space into a uniform grid, but the cost 
of such a subdivision is prohibitive. Instead, interval 
algorithms usually divide space hierarchically, only di¬ 
viding up regions of space which may contain solutions. 
In addition to time and space efficiency, the hierarchical 
alogithms have the advantage of progressive refinement 
approaches: at all times there is a valid approximation 
of the entire solution and the approximation improves as 
the algorithm progresses. 

To avoid having to compare every square in one pa¬ 
rameter space against every square in the other, each leaf 
node of the tree maintains a list of the leaf nodes in the 
other tree which it overlaps. Since the bounding volume 
of a child must be completely contained within the vol¬ 
ume of its parent, when we subdivide, we only need to 
check the new children against the boxes intersected by 
their parents. The subdivision step of our algorithm is: 


Subdivide(node) 

if node’s intersect list is not empty 
subdivide node into children 
for each i in node’s intersect list 
remove node from i’s intersect list 
for each child of node 
if child overlaps i 
add i to child’s intersect list 
add child to i’s intersect list 


The basic algorithm for finding intersections is to pick 
a leaf node from one of the trees and subdivide it. A 
list of “live” leaf nodes (i.e.: ones which contain over¬ 
laps) provides a description of the current model of the 
intersection curve as well as a “to do” queue. How the 
next node to be divided is chosen from the list of poten¬ 
tial choices provides control over the search algorithm. 
An obvious choice is searching breadth-first by always 
choosing the node closest to the root of its tree, which 
produces an even distribution of sampling (Figure 1). 
However, we use the ability to control search to create 
algorithms which fit our needs. 

4.2 Search Strategy 

Our goal is to compute a set of points on the intersec¬ 
tion curves, link them into a polygonal approximation 
and triangulate the region bounded by the polygon (in¬ 
terior or exterior as appropriate). Doing this requires 
that we be able to find a set of isolated points on the 
intersection curves (dots) with a controllable sampling 
rate. We do this with a two-part search strategy. The 
first part is a breadth-first search which ensures that each 
region of parameter space that could possibly contain an 
intersection is subdivided to a minimum degree. After 
the breadth-first subdivision, we are left with a set of 
“live” regions of parameter space which could still con¬ 
tain intersections based on the interval arithmetic tests. 
Many of these live regions turn out to be false positives - 
regions which in fact do not contain any intersections de¬ 
spite overlapping bounding boxes. In the second stage 
of the algorithm, we resolve the false positives using 
depth-first subdivision. We either find a dot to witness 
the intersection, or prove that no intersection exists. Fig¬ 
ure 1 illustrates the results of the second search phase. 
The algorithm has proven that many of the live regions 
of parameter space in figure 2 really do not contain any 
intersections. In each of the remaining regions, the algo¬ 
rithm provides a dot and its mate in the other parameter 
space. 
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Figure 1: Stages in the breadth-first refinement of the intersection of a torus and a plane. On top is a display of 
the parameter space of the torus, on the bottom is the plane. All nodes are enclosed by squares. Leaf nodes with 
non-empty intersect lists are filled with grey. 


Beginning with breadth-first subdivision and contin¬ 
uing with depth-first subdivision allows us to control 
dot spacing and dot accuracy separately. The level of 
the breadth-first subdivision controls the spacing, and 
the level of the depth-first subdivision controls the ac¬ 
curacy. This separate control is extremely valuable in 
practical situations, and is lacking in many algorithms. 

5 Stringing and Triangulating 

The interval refinement algorithm presented in the 
preceding sections provides a bounding region on the 
curves and points on the curves. In this section, we 
consider the problem of connecting these points together 
to build a polygonal representation of the curve and 
to triangulate only a part of parameter space bounded 
by these curves, to provide a “trimming” operation of 
cutting one surface against another. 

The output of the interval refinement algorithm could 
be used to drive a marching method intersection. The 
points provide starting locations, and the bounding re¬ 
gions could help control the search. However, we are 
interested in directly applying the found points since 
we assume the user has specified tolerances which will 
provide a sufficient number of points to be found. 

The first thing to notice about the dot connection prob¬ 
lem is that the solution is not uniquely determined by the 
positions of the dots and their surrounding regions. Fig¬ 
ure 3 illustrates the kinds of ambiguity which can arise. 
In order to string the dots into a chain, we must make 
further assumptions about the underlying intersection 
curve. 

In stringing the intersection points together, we as- 



Figure 4: A “bump” is common case where our stringing 
assumptions fail. It is easy to create a hueristic which 
handles this special case. 


sume that the intersection curve has low curvature rela¬ 
tive to the grid size, and that different curves are always 
seperated by a grid cell at all points along their length. 
If we are interested in curves which do not cross, this 
restriction is acceptable if we pick a sufficiently small 
grid size. Under these assumption, an intersection curve 
will almost always pass through a cell exactly once, en¬ 
tering and exiting through different sides. Each grid 
cell which contains part of the intersection will be adja¬ 
cent to exactly two others, unless it is at an edge. It is 
straightforward to connect the dots in this case. 

Even if the intersection curve is well-behaved, quan¬ 
tization errors can cause the two neighbor assumption to 
be violated at any grid resolution. An example of such 
an error is the “bump” shown in Figure 4. Fortunately, 
this type of situation is not too difficult to deal with. If 
we remove the top-most dot in figure4, the two-neighbor 
condition is restored and it is easy to connect the dots. 
Our stringer identifies such situations and removes dots 
to resolve the ambiguity. 

Although we are unable to provide strong guarantees 
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Figure 2: Dot finding is applied to the example of Figure 1. The white circles represent dots. For each gray square 
in Figure 1, a search for a dot was executed. If no dot was found, the square is rejected, and is not shaded. 



Figure 3: Although the stringing order for a set of dots may seem obvious, the curve may actually do something 
else. Without making further assumptions, points and bounding regions cannot uniquely determine a stringing order. 
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about the robustness of our stringer, it has performed 
well in our limited tests. Obviously, if the intersection 
is not a curve, but rather some degenerate case such as 
a surface region, stringing will not succeed in building 
a curve. Curves which intersect, or meet with tangency, 
violate the two neighbor criteria. By delaying decisions 
about ambiguous cases, other parts of the curves can be 
built correctly, typically providing enough information 
to make the stringing decisions, or at least satisfy the 
user. In cases that remain ambiguous, our prototype 
implementation uses further depth-first subdivision to 
verify the dot positions. We continue to explore other 
ways to resolve ambiguous situations with additional 
subdivision 

Once the dots on each surface are connected, corre¬ 
sponding chains on each surface can be merged. This 
is important since we want to have one description of 
the curve which has the property that it includes a dot in 
each grid cell of both parameter spaces through which it 
passes. 

5.1 Triangulating Trimmed Surfaces 

One of our motivations for performing intersection 
calculations is to create trimmed surfaces which can be 
sewn together. In such cases it is crucial that the pieces 
can be assembled and mate together without “cracking” 
at the seams. In order for surfaces to fit together without 
cracks, their edges (as space curves) must be identical. 

Our intersection method provides us with chains of 
dots in the parameter space of each surface which form 
piecewise linear approximations to the edges of the 
trimmed surface. If we triangulate two regions of differ¬ 
ent parameter spaces bounded by the same chains, the 
triangles will match without cracking, as shown in figure 
5. 

In our prototype implementation, we use a flood fill 
to place triangles in each grid square bounded by the in¬ 
tersection curves and then march around the intersection 
curve to fill in small triangles around the edge. This sim¬ 
ple strategy has the disadvantage that it is not adaptive. 
To get sufficient detail around complicated intersections, 
we must also create large numbers of triangles in other 
areas of the surfaces. Techniques for doing adaptive 
subdivision could easily be added to our triangulation 
scheme, and in fact can employ information provided 
by the interval evaluations. 

6 Status and Directions 

We have implemented interval arithmetic as part of 
our mathematical toolkit in C-h-[ 8). The algorithm de¬ 
scribed in this paper has been implemented and incorpo¬ 


rated in an interactive scene composition program (used 
to create figures 5,6, and 7) and in a mathematical mod¬ 
eling system [10]. 

Figures 5, 6, and 7 show results of our algorithm. 
In figure 5, the intersection curve is used to cut the 
sphere and cone to create a boundary representation of 
the boolean subtraction. The front row shows the results 
for three different levels of initial breadth-first search. 
Note that since dot size is controlled independently of 
spacing, even coarse tesselations with few points on 
the intersection curve still yield objects without cracks. 
The boolean union is rendered in the rear center. On 
each side, the separate objects are rendered with textures 
showing their parameterizations. The red curve on the 
texture is the trimming curve found by the algorithm. 
We have rotated the sphere so that its trimming curve is 
visible. The checkerboard pattern shows the cells that 
would be created by a uniform subdivision of parameter 
space to the level used in the breadth-first search of the 
front-center subtraction. Note that the actual breadth- 
first subdivision is not uniform. The blue squares are 
breadth-first subdivisions through which the trimming 
curve passes. 

In figure 6, a sphere whose radius is modulated with 
a sinusoid is intersected with a torus. Figure 7 shows 
the intersection of a sin(r)/(r 4- e) height field with 
a sphere. In both cases, the objects have been texture 
mapped in the same manner as figure 5. Figure 7 shows 
the union from top and bottom. 

Although the prototype has been useful for creating 
a wide variety of interesting objects, there are several 
extensions to the basic algorithm which would be inter¬ 
esting to explore. For example, the search and trian¬ 
gulation techniques could be made to be non-uniform 
and adaptive to improve performance on surfaces with 
varying levels of detail. Stringing could be improved by 
using further subdivision to disambiguate difficult cases. 

The surface intersection technique presented here can 
handle a very wide variety of surfaces because of its 
reliance on interval arithmetic. The algorithm progres¬ 
sively refines its model of the curve until user specified 
tolerances are met allowing separate control over the 
spacing and accuracy of the intersection points. String¬ 
ing the points together and triangulating the resulting 
parameter-space regions makes it possible to construct 
crack-free boundary representations of objects which are 
difficult to create with other methods. 
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Figure 5: Trimming a cone against a sphere. Note that even coarse tcsselations arc crack-free. 


Figure 6: Intersection of a five-lobed surface and a torus. 


Figure 7: Intersecting a sphere with a sin(r)/r bump. The middle right is a bottom view of the middle left. 
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Abstract 

We present discrete physically-based methods for gen¬ 
erating polygonal approximations of implicit surfaces. 
These methods not only generate a combinatorial man¬ 
ifold approximating the surface, but also produce a 
structure that is well suited to numerical simulations 
in physically-based modeling and animation systems. 

Keywords: implicit models, physically-based models, 
polygonization, triangulation, domain decomposition. 


1.1 Polygonization of Implicit Surfaces 

To capture the geometry of an implicit manifold, we 
must sample and structure points on it. In this paper, 
our objective is to structure the points in order to obtain 
a combinatorial manifold M that is close to M in some 
suitable topology. The manifold M is called a polygonal 
approximation of the surface M. 

Polygonal approximations to implicit manifolds were 
first described in the classic paper (Allgower &: Schmidt, 
1985). The method proposed by Allgower and Schmidt 
consists of the following steps: 


1 Introduction 


1. Compute a triangulation of the ambient space; 


Consider a differentiable function F : R n —► R, for 
which 0 is a regular value. This means that the gradient 
vector 


V F{p) = 





( 1 ) 


is non-zero at all points p in the inverse image M = 
F -1 (0). In this case, the set M is a differentiable man¬ 
ifold of dimension — 1 that we shall simply call an 
implicit manifold (Spivak, 1965). 

Recently the use of implicit surfaces has attracted 
the attention of researchers in geometric modeling. Im¬ 
plicit surfaces are suitable for applying visualization 
techniques based on ray-tracing (see (Hanharan, 1983) 
(Barr, 1986)), but some difficulties arise when we try 
to sample or structure points on them in order to gain 
more information about their topology and geometry 
(Figueiredo, 1991). One of the important issues in this 
sampling and structuring problem is the computation of 
polygonal approximations to the surface. Polygonal ap¬ 
proximations enable us to use the fast, special purpose 
processors of graphic workstations in order to display 
implicit surface models. 


2. Replace Jhe function F by its simplicial approxi¬ 
mation F relative to this triangulation; 

3. Refine the triangulation so that F is close to F. 
The combinatorial manifold is then obtained as the 
inverse image F~ l ( 0) of the simplicial approxima¬ 
tion. 

The Freudenthal triangulation is the simplest trian¬ 
gulation in R n : the space is subdivided into cubes and 
the triangulation is obtained by subdividing each 71- 
cube into n! simplices. Figure 1 shows a two dimen¬ 
sional example. For more details the reader should con¬ 
sult (Allgower &: Georg, 1990). 

Several variations of Allgower’s method exist in the 
graphics literature (Wyvill et al., 1986), (Loreson &: 
Cline, 1987), (Bloomental, 1988), (Velho, 1989), (Hall 
&: Warren, 1990). The correct computation of polygo¬ 
nal approximations to implicit manifolds depends on a 
priori estimates of the variation of the surface geometry 
(this is the refinement step (3) in the above algorithm). 
For this reason, some of the aforementioned works in¬ 
volve the computation of adaptive polygonizations in 
order to get better approximations. 
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Figure 1: Freudenthal triangulation 

1.2 Physically-based Modeling 

Modeling is the most labor intensive part in the process 
of computer graphics. Modeling the motion of objects 
is often very difficult when the main goal is to generate 
realistic motion. The best solution to this problem is 
to model the physical habitat of the object: the mo¬ 
tion will be a consequence of the interaction between 
the object and its environment, according to the laws 
of physics. A discussion of this physically-based model¬ 
ing approach is found in several papers in the graphics 
literature (Barr et al. } 1987), (Terzopoulos Sc Fleischer, 
1988). 

1.3 Physically-based Polygonization of 
Implicit Objects 

In this paper we use physically-based methods to com¬ 
pute polygonal approximations. These methods yield 
naturally adapted polygonizations. They also make it 
possible to construct a model such that the resulting 
polygonization has a natural physical structure associ¬ 
ated with it which can be exploited for physically-based 
simulations. 

Physically based methods in the study of implicit 
surfaces is a very recent research topic. In (Velho Sc 
Gomes, 1991) a spring-mass model is used to construct 
an adapted shell that approximates the geometry of the 
manifold. In (Velho Sc Gomes, 1991a), it is shown how 
this spring-mass shell can be used to do dynamical sim¬ 
ulations with implicit models. In (Figueiredo, 1991), 
physically-based particle systems are used to sample 
points on an implicit manifold; algorithms for struc¬ 
turing such samples provide a powerful technique for 
modeling with implicit surfaces. 

The physically-based approach to constructing piece- 
wise linear approximation of implicit manifolds is re¬ 
lated to the variational methods used to generate adap¬ 
tive numerical grids for the numerical solution of partial 
differential equations (Thompson et a/., 1985). How¬ 
ever, there are two main differences: 

• To our knowledge, the adaptive methods in the nu¬ 
merical grid generation literature are developed for 
structured grids. The problem of polygonization of 
implicit surfaces is a non-structured one. 


• In numerical grid generation, the physics of the as¬ 
sociated problems may drive the adaptation of the 
grid. In our case, the primary interest is in the ge¬ 
ometry and topology of the underlying grid space. 

Our methods can certainly be used to generate adap¬ 
tive numerical grids for problems where the physical do¬ 
main can be defined implicitly. In fact, polygonization 
methods for implicit surfaces seem to be a very attrac¬ 
tive technique for generating non-structured numerical 
grids. 

In this paper, we are interested in polygonizations 
that are regular or “quasi-regular” triangulations. A 
quasi-regular triangulation is a 2-dimensional simplicial 
complex which is constituted by elements that are al¬ 
most equilateral and equiangular. This type of poly¬ 
gonization is desirable in a number of applications to 
modeling and numerical simulation. 

1.4 Overview 

Section 2 describes the two discrete physical systems 
that we use to construct the polygonal approximation. 
Section 3 describes the polygonization algorithm us¬ 
ing physically-based particle systems. Section 4 de¬ 
scribes the polygonization algorithm using a spring- 
mass physical model. Section 5 gives examples and 
makes some comparisons between the two approaches 
described. Section 6 closes with a brief description of 
our current work in this area. 

2 Discrete Physical Systems 

A discrete physical model abstracts matter as an ensem¬ 
ble of particles related to each other by forces. Several 
physical phenomena may be naturally modeled using 
discrete physical systems (Greenspan, 1973). In a dis¬ 
crete physical system the particles interact under the ac¬ 
tion of internal and external forces. The associated mo¬ 
tion equations are easily written as a classical F = ma 
equation of Newtonian dynamics. Simple numerical in¬ 
tegration methods, such as Euler’s method generally 
produce good results. 

In this work, we use two discrete physical models: a 
particle system and a spring-mass system. 

2.1 Dynamic Particle Systems 

A particle system is a finite set of particles which have 
an initial position in space and whose behavior in time 
is governed by algorithmic rules. Particle systems were 
introduced in graphics by Reeves as an algorithmic tech¬ 
nique for modeling fire explosions (Reeves, 1983). In a 
physical particle system , the particles have masses and 
the Newtonian mechanics dictates their dynamical be¬ 
havior. The motion of a particle depends on its mass, 
position and velocity, and on the forces acting on it, ei¬ 
ther by other particles or by the ambient medium. A 
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physical particle system is a discrete physical model as 
defined above. 

Physical particle systems have been used to simulate 
natural phenomena such as waterfalls (Sims, 1990) and 
fireworks (Weil, 1987). These systems in general re¬ 
quire a significant amount of computational effort be¬ 
cause of the number of particles involved. In Section 3 
we shall use a simple physical particle system to com¬ 
pute a polygonal approximation to an implicit manifold. 
More recently, (Szeliski & Tonnesen, 1991) have applied 
physical particle systems to surface modeling. 

2*2 Spring-Mass Systems 

A spring-mass system is a physical particle system 
structured by connecting pairs of particles with springs. 
The springs impose internal forces that depend on the 
distance between these particles and govern the global 
behavior of the system. The resulting structure can be 
represented as a graph, where each particle is a node, 
and two nodes are connected when there is a spring 
joining the corresponding particles. Conversely, each 
graph linearly embedded in the space is naturally asso¬ 
ciated to a spring-mass system — a duality that will be 
exploited in Section 4 for triangulations. 

Spring-mass systems are suitable to create physically- 
based models of deformable objects for dynamical sim¬ 
ulation (Haumann, 1987), (Terzopoulos et al.> 1989). 
In the recent paper (Terzopoulos & Vasilescu, 1991), a 
spring-mass system is applied to adaptive image sam¬ 
pling and surface reconstruction. This approach has 
several connections with our method. 

3 Polygonization using Dynamic 
Particle Systems 

In this section, we describe an algorithm for computing 
a polygonal approximation of an implicit manifold using 
a physically-based particle system (Figueiredo, 1991). 

3.1 Sampling using Dynamic Particle 
Systems 

To properly sample a geometric object we must com¬ 
pute enough points on it so that its geometry can be 
reconstructed from the samples within some tolerance. 
In the case of a manifold given implicitly by a dif¬ 
ferentiable function F : R n —► R, such a computa¬ 
tion requires finding severed solutions of the equation 
f(x) = 0 . Physically-based methods for the solution 
of nonlinear equations have been known for some time 
(Incerti et al., 1979), although it seems that the main in¬ 
terest then was in finding any one solution, and not the 
many solutions that sampling requires. Consequently, 
these methods have not been applied to geometric mod¬ 
eling. 

The particle systems we use for sampling derive their 
dynamics from the potential function | jp|. The particles 


will seek equilibrium positions on the manifold F'~ 1 ( 0 ) 
because these are positions of minimum potential en¬ 
ergy. If the gradient of F is non-singular, then these 
are the only equilibrium positions. 

This interpretation of the gradient of |F| as a force 
field implies the following equation of motion for a unit 
mass particle: 

^f+7^+sign( J F)VF = 0, (2) 

where 7 is a positive real number representing friction 
proportional to velocity. (Incerti et al., 1979) have pro¬ 
posed a similar differential equation for finding zeros of 
functions R a —► R n . 

3.2 Structuring Samples 

The samples obtained by simulating the physics of par¬ 
ticle systems have no structure other than the equilib¬ 
rium position of each particle. Moreover,the samples 
are not evenly distributed across the surface, but rather 
tend to concentrate around points of high curvature. 
While this could be exploited for investigations on the 
geometry of the surface, a polygonal approximation in¬ 
terpolating such samples will rarely be quasi-regular. 

In order to obtain a quasi-regular approximation, the 
sample is subjected to a relaxation process similar to 
the one used by (Turk, 1991) and (Szeliski & Tonnesen, 
1991): particles repel each other with an intensity that 
rapidly decreases as the distance between the particles 
increases. Moreover, the movement of each particle is 
constrained to stay close to the surface by projecting 
repulsion forces onto the tangent plane. 

The result of this relaxation process is a more uniform 
sampling of the surface. The desired polygonal approx¬ 
imation is then obtained by computing the Delaunay 
triangulation associated with the points and choosing 
the triangles that approximate well the tangent planes 
at each of its vertices. 

4 Polygonization using Spring-Mass 
Systems 

In this section, we describe a method to construct a 
polygonal approximation to an implicit manifold using 
a spring-mass system. 

4.1 Subordinated Triangulation 

Initially we define a system of spring-mass elements as¬ 
sociated with a Freudenthal triangulation of the space. 
Like the particle systems described in Section 3.1, this 
system is subjected to deformation forces derived from 
the gradient field of the implicit manifold. Its equilib¬ 
rium position gives a triangulation of a region of the 
space that contains the manifold M and has the follow¬ 
ing properties: 

• M is transversal to the triangulation; 
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• The simplices are quasi-regular; 

• For each n-simplex a that intersects M there exists 
a point p 6 M close to the barycenter of <r such that 
the tangent space of M at p is close to the support 
hyperplane of one of the faces of a. 

Figure 2 illustrates the properties above in two dimen¬ 
sions. A triangulation with these properties is said to be 
subordinated to the surface M (Velho & Gomes, 1991). 



Figure 2 : Subordinate triangulation 


4.2 Mesh Generation 

The spring-mass lattice generation process requires the 
following steps: 

1 . A Freudenthal triangulation is created within a vol¬ 
ume bounding the implicit manifold; 

2 . Each simplex of the triangulation that intersects 
the implicit manifold is identified. Together, they 
form an intersecting simplicial complex; 

3. A spring-mass system is created by associating 
mass nodes and springs to the vertices and edges 
of the intersecting complex. 

The construction of the Freudenthal triangulation in 
step 1 is obtained as explained in Section 1.1. The 
identification of the relevant simplices in step 2 is done 
through a classification of the simplicial cells by test¬ 
ing the sign of the implicit function at the vertices of 
each simplex. Assuming that the uniform grid is suffi¬ 
ciently fine, if the signs are the same for all vertices, the 
simplex must be totally inside or totally outside of the 
manifold M . If the signs are different, then the simplex 
must intersect the surface M. 


4.3 Mesh Deformation 

After generating the mesh we use a physically-based 
approach in order to obtain the final triangulation that 
will be used for the polygonization of M . The dynamic 
simulation submit the spring-mass system to deforma¬ 
tion forces with the purpose of conforming it to the 
shape of the implicit manifold. The process takes into 
account the internal forces produced by the springs as 
well as external deformation forces. 

The external forces are based on information de¬ 
rived from the geometry of the implicit manifold. More 
specifically, two opposite attracting and repulsing force 
fields are generated using the gradient vector field of 
the implicit manifold. One field defined inside a small 
neighborhood of the object’s boundary generates re¬ 
pelling forces that prevent points from being too close 
to the surface. The other force field, defined outside 
this neighborhood, generates attraction forces that pulls 
points towards the surface. 

In order to facilitate the relaxation of the mesh struc¬ 
ture into the desirable configuration, the initial rest 
length of the strings is made smaller than the initial 
grid spacing. This means that we start the process with 
a tensioned mesh that moves to a rest position under 
the action of internal and external forces. 

4.4 Polygonization 

The polygonization of the implicit manifold M is now 
obtained using the same technique of Allgower’s algo¬ 
rithm described in Section 1 . 1 : since the triangulation 
obtained is subordinated to M, the manifold intersects 
each 3-simplex a in at most 4 distinct points, each one 
located on a different 1 -dimensional face. Therefore, 
the linear approximation to M inside a is formed by 
one or two triangles ( 2 -simplices), The set of all these 
simplices constitute the combinatorial manifold that ap¬ 
proximates M. We shall illustrate the method with 
some examples in section 5.1. 

5 Results 

In this section, we show the result of applying the 
two methods described in Sections 3 and 4 to compute 
polygonal approximations of implicit surfaces. We also 
make a comparative analysis of the polygonizations ob¬ 
tained and discuss the differences and similarities be¬ 
tween the two methods. 

5.1 Examples 

Figures 3 and 4 illustrate the polygonization method us¬ 
ing the particle systems presented in Section 3 . Figure 
3-a shows the trajectories of a particle system associated 
with a two-dimensional curve with 2 connected compo¬ 
nents described by the implicit equation y 2 — x 3 + x = 0 . 
Figure 3-b shows the final equilibrium positions of these 
particles along the curve. Figure 4-a shows the sample 
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points on the surface of the sphere x 2 + y 2 -f z 2 = 1. 
Figure 4-b shows the polygonal approximation for the 
sphere. 

Figures 5 to 7 illustrate the polygonization using the 
spring-mass system method presented in Section 4. Fig¬ 
ure 5 demonstrates the mesh deformation process for 
the cylinder x 2 -{- y 2 = 1 . Figure 5-a depicts the initial 
mesh created from a Freudenthal triangulation of the 
ambient space, Figure 5-b shows the final mesh in its 
equilibrium position. It is apparent that the mesh was 
constrained to lie in a tubular neighborhood of the im¬ 
plicit surface, conforming to the cylinder’s shape. The 
polygonal approximation is obtained from this deformed 
mesh. 

Figure 6 shows a detail of the polygonization asso¬ 
ciated with the spring-mass mesh before (a) and after 
(b) the deformation process. Note how the deforma¬ 
tion of the mesh produces a very homogeneous polygon 
structure, transforming long, thin elements to nearly 
equilateral ones. This is because the triangulation re¬ 
sulting from the dynamical simulation is subordinate 
to the surface; as a consequence, the associated polygo¬ 
nization is quasi-regular. 

Figure 7 shows the final polygonal approximation for 
the cylinder. 

5.2 Comparisons 

The main difference between the two methods presented 
in this paper is related to the order in which the op¬ 
erations of sampling and structuring of points on the 
implicit surface are performed. 

The dynamical particle systems method in Section 3 
first generates samples of the implicit object and sub¬ 
sequently structures these samples in order to create a 
polygonal approximation of the object. 

The spring-mass systems method of Section 4 does 
the opposite. First the structure is created from a reg¬ 
ular tessellation of space and second, this structure is 
used to sample the implicit object. 

It is interesting to note that the physically-based ap¬ 
proach is applied only to the sampling process. The 
structuring operation involves combinatorial methods. 

The two methods produce equally good polygonal ap¬ 
proximations of implicit surfaces. The combinatorial 
manifold generated by them is constituted by “almost 
fat” triangles. 

The dynamical systems employed in both methods 
are very stable. The convergence to an equilibrium state 
is in general reasonably fast, requiring a small number 
of time steps (usually less than 100 ). 

6 Conclusions 

We have presented a new approach for the polygoniza¬ 
tion of implicit surfaces based on physically-based meth¬ 
ods. The two methods described'exploit different strate¬ 
gies to obtain polygonizations that are quasi-regular 


and faithfully approximate the original implicit objects. 

The use of a physically-based approach for the poly¬ 
gonization of implicit objects provides great flexibility 
and control of the resulting structure. 

Although this process is computationally more ex¬ 
pensive than traditional methods, due to the numerical 
simulation of a dynamical system, it produces qualita¬ 
tively better results. 

We are presently incorporating these polygonization 
methods in a modeling and animation system for im¬ 
plicit objects. 

Our current research also includes the development of 
adaptive physically-based polygonization methods and 
the application of these methods to numerical grid gen¬ 
eration problems for domains defined by implicit sur¬ 
faces. 

In relation to the method of Section 3, we are in¬ 
vestigating higher order approximations using intrinsic 
Voronoi diagrams. This would enable us to do continu¬ 
ous deformations using spline patches. 

7 Acknowledgements 

This research was partially supported by CNPq - 
Conselho Nacional de Desenvolvimento Cientifico e 
Tecnologico, the Natural Sciences and Engineering Re¬ 
search Council of Canada and the Information Technol¬ 
ogy Research Center of Ontario. 

8 References 

Allgower, E. L. & Schmidt, P. H., (1985): An algorithm 
for piecewise-linear approximation of an implicitly de¬ 
fined manifold, SIAM Journal of Numerical Analysis, 
22, 322-346. 

Barr, A., (1986): Ray Tracing Deformed Surfaces Com¬ 
puter Graphics , 20 , 4, 287-296, (Proceedings of SIG- 
GRAPH ’ 86 ). 

Barr, A., Barzel, R., Haumann, D., Kass, M., Platt, 
J., Terzopoulos, D., Witkin, A., (1987): Topics 
in Physically-based Modeling , SIGGRAPH’87 course 
notes #17. 

Bloomenthal, J., (1988): Polygonization of implicit sur¬ 
faces, Computer Aided Geometric Design , 5, 341-355. 

Figueiredo, L. H., (1991): Computational Morphology 
of Implicit Curves , Ph.D. thesis in preparation, IMPA. 

Greenspan, D., (1973): Discrete Models, Addison- 
Wesley, Reading, MA. 

Hall, M. & Warren, J., (1990): Adaptive polygonization 
of implicitly defined surfaces, IEEE Computer Graphics 
& Applications, 10 , 33-42. 

Hanharan, P., (1983): Ray TVacing Algebraic surfaces 
Computer Graphics , 13, 1 , 83-90, Proceedings of SIG- 
GRAPH ’83. 



Graphics Interface ’92 



255 



Figure 3: Trajectories (a) and final positions (b) of particles for 2D curve 



( a ) 


(b) 


Figure 4: Sample points on the surface of a sphere (a) and polygonization of the sphere (b) 


Graphics Interface ’92 
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(b) 


Figure 6: Detail of the polygonization before (a) and after (b) deformation of the mesh 
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Figure 7: Final polygonization of the cylinder 
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Abstract 

General 3x3 linear or 4x4 homogenous matrices can be 
formed by composing primitive matrices for translation, 
rotation, scale, shear, and perspective. Current 3-D 
computer graphics systems manipulate and interpolate 
parametric forms of these primitives to generate scenes and 
motion. For this and other reasons, decomposing a 
composite matrix in a meaningful way has been a long¬ 
standing challenge. This paper presents a theory and 
method for doing so, proposing that the central issue is 
rotation extraction, and that the best way to do that is Polar 
Decomposition. This method also is useful for renormal¬ 
izing a rotation matrix containing excessive error. 

R6sum6 

Des matrices correspondant h des transformations lineaires 
en 3 dimensions, ou bien a des transformations homogenes 
en 4 dimensions, peuvent etre construites en composant des 
matrices qui decrivent des transformations elementaires: 
d^placement, rotation, homothetie, glissement, et perspec¬ 
tive. Les systemes actuels de visualisation graphique a trois 
dimensions manipulent des formes paramdtriques de ces 
transformations elementaires, pour recreer des scenes et des 
mouvements. II en decoule Tinteret de trouver des decompo¬ 
sitions pratiques de matrices composees. Nous pr^sentons 
ici une technique pour trouver de teiles decompositions. Le 
probleme fondamental est l’extraction des rotations, et nous 
demontrons qu’une decomposition polaire est la methode de 
choix. Cette methode est aussi utile quant il faut 
renormaliser une matrice de rotation qui contient des erreurs 
excessives. 

Keywords: homogeneous matrix, matrix animation, 

interpolation, rotation, matrix decomposition, Polar 
Decomposition, QR Decomposition, Singular Value 
Decomposition, Spectral Decomposition, greedy algorithm 

Introduction 

Matrix composition is well established as an important part 
of computer graphics practice and teaching [Foley 90]. It is 
used to simplify and speed the transformation of points, 
curves, and surfaces for modeling, rendering, and animation. 


Matrix decomposition—the focus of this paper—is less 
well known in computer graphics. It is useful for a variety of 
purposes, especially animation and interactive manipul¬ 
ation. 

The usual transformations of an object can be described by 
3 x4 affine matrices; but the 12 entries of such a matrix are 
not very meaningful parameters. To understand, much less 
modify, matrices requires a good decomposition. Any 
decomposition must account for all 12 degrees of freedom 
(16 for 4x4 matrices) in the independent parameters of the 
primitives used. A decomposition that provides too few 
parameters will not be able to handle all inputs, while one 
that provides too many will not be stable and well-defined. 
The greatest problem, however, is ensuring that the 
decomposition is meaningful. 

Most widely used 3-D animation systems, typified by 
Stern’s bbop at NYIT [Stern 83], Gomez’s twixt at Ohio 
State [Gomez 84] and Duffs md at Lucasfilm (later Pixar) 
allow the parameters of primitive transformations to be set 
interactively at key times, and compute transformations at 
intermediate times by spline interpolation in parameter 
space. Sometimes, however, only a composite matrix is 
available at each key frame, or more design flexibility is 
needed than that allowed by a hierarchy of primitive 
transformations. It is possible to interpolate the entries of a 
composite matrix directly, but the results are usually 
unsatisfactory. Decomposition allows the use of standard 
interpolation methods, and can give much better results. 
Matrix animation is discussed in more detail below. 

Most authors have considered decomposition with less 
stringent criteria than ours. A common motivation is the 
need to synthesize an arbitrary matrix from a limited set of 
primitives, without regard for meaningfulness of the 
decomposition [Thomas 91]. Typically, these methods rely 
on a sequence of shears [Greene 86], and give factors that 
depend on the coordinate basis used. Shears are one of the 
less common operations in graphics, and a sequence of 
shears is a poor choice for animation. In contrast, the 
decomposition we propose has a simple, physical, coor¬ 
dinate independent interpretation, preserves rigid body 
motion as much as possible, and animates well. 
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Composition and Decomposition 

Three types of matrix are commonly used for 3-D graphics: 
3x3 linear, 3x4 affine, and 4x4 homogeneous; similar types 
with one less column and row are used for 2-D graphics. The 
homogeneous matrix is most general, as it is able to 
represent all the transformations required to place and view 
an object: translation, rotation, scale, shear, and perspec¬ 
tive. Any number of transformations can be multiplied to 
form a composite matrix, so that a point can be transformed 
from model space coordinates to screen space coordinates in 
a single step.Generally, however, perspective is treated as a 
separate step in the viewing process—because lighting 
calculations must be done first—and not used for modeling 
or object placement. All the transformations except per¬ 
spective can be accomodated by an affine matrix, which, in 
turn, can be considered just a 3x3 linear matrix with a trans¬ 
lation column appended. (Following [Foley 90], we write 
points as column vectors, [x yz 1 ] T , which are multiplied 
on the left by the matrix.) 
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Figure 1. Primitive Transformation Matrices 


Each primitive transformation has a more meaningful and 
concise representation than its matrix: a vector for most, a 
quaternion for rotations. H is not too difficult to recover the 
concise form if the matrix for the primitive is available 
[Goldman 91][Shoemake 91]. Once primitives have been 
multiplied into a composite matrix, however, recovery is 
generally impossible. Even so, a great deal can be done, as 
we shall see. 


Primitive recovery is difficult for three reasons: absorption, 
order, and interaction. The first two problems are intrac¬ 
table; the third is the focus of this paper. Absorption is 
simple: a sequence of translations gives a result which is 
indistinguishable from a single translation, or from any 
number of different sequences; the same is true of other 
primitives. Order is also simple: the effect of a translation 
followed by a scale could as easily be achieved by com¬ 
posing primitives in the opposite order, likewise for other 
pairs. Interaction is more subtle: most transformations 
change all columns of the matrix, so scaling (for example) 
affects translation; all pairs of primitives interact. Notice 
any shear can be achieved by combining rotation and scale. 


While absorption and order cannot be unscrambled, they can 
be standardized; for animation and other applications of 
interest, this usually suffices. Absorption can simply be 
ignored; that is, no attempt is made to tease apart a trans¬ 
lation (except perhaps into jc, y, and z components). Order 
is handled most easily by assuming a canonical order, such 
as Perspective of Translation of Rotation of Scale of object. 
Which canonical order is chosen is partly a matter of taste; 
this particular one makes translation trivial to extract, and 
places perspective in the order expected for a transformation 
to camera coordinates. If more information is made avail¬ 
able in a particular situation, it may be possible to improve 
upon these standard assumptions; for example, it may be 
known that only x translation took place, or that scaling 
was done last. Such special case extraction is outside the 
scope of this paper. 

Rigidity and Rotation 

A perspective matrix of the form given above is easy to 
extract as a left factor of a composite homogeneous matrix, 
C = PA, with non-singular 3x3 corner; the details are left as 
an exercise for the reader. Notice that the usual perspective 
matrix includes translation and scale; we have chosen the 
minimal form necessary to reduce C to an affine matrix.^ 
Likewise, a translation is easy to extract as the left factor of 
the remaining affine matrix, A =TM; simply strip off the 
last column. The matrix M then essentially will be the 3x3 
matrix of a linear transformation. It would be simplest not 
to factor M at all, but to animate its entries directly. The 
results of this overly simple approach are visually discon¬ 
certing, but worth investigating. 

Direct matrix interpolation treats each component of the 
matrix separately, and creates intermediate matrices as 
weighted sums of nearby key matrices. For example, linear 
interpolation between keys and M 2 uses (l-/)M]+rM 2 , 
while cubic spline interpolation uses affine combinations, 
a|M|+a 2 M 2 +a 3 M 3 -KX 4 M 4 , with ctj+cx 2 +(X 3 +(X 4 = 1. The 
results of this approach are immediately deduced from the 
linearity of matrix multiplication. 

Proposition: A point transformed by a 
weighted sum of matrices equals the weighted sum 
of the transformed points; that is, 

<X “< M /) P = X a i /»)• 

i i 

An example of this behavior can be seen in Figure 2, where 
the chin and hat back move steadily along a line from initial 
to final position, as do all the points. (We will use planar 
examples because they are easier to interpret on the page, 
but illustrate the same issues as spatial examples.) Notice 
that the interpolated matrix twice becomes singular as the 


t But a permutation matrix may also be needed to provide 
pivoting for what is, in effect, a block LU decomposition. 
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image appears to flip over. At any moment of singularity 
the image will collapse onto a line (or worse, a point). 



Figure 2. Direct Matrix Interpolation 

Consider a square centered at the origin, and two key 
matrices: the identity and a 180° rotation. Since there are 
only two keys, only linear interpolation makes sense. 
Then, however, the theorem implies that each comer of the 
square will move linearly to its rotated position, which is 
diagonally opposite; the square will collapse through the 
origin. Although the distortions diminish with smaller 
angles of rotation, the square loses its shape. We expect 
rotations to transform the shape rigidly; direct matrix 
interpolation fails to do so. On the other hand, there is no 
problem with matrices for translation, scale, or shear. 

Experiments with apparent motion (flash one image, flash 
another, see motion) suggest that the human visual system 
infers rigid motion as much as possible [Carlton 90] 
[Shepard 84]. Rotation is the only rigid transformation that 
is distorted by direct matrix interpolation. It therefore 
seems reasonable to conclude that the central problem for 
matrix animation is to extract a rotation in the best 
possible way, so that it can be interpolated as a rotation. 

Decomposition Methods 

Rotation matrices have simple defining properties: each 
column is a unit length vector which is perpendicular to the 
others, and the third column is the cross product of the first 
two. (Rows satisfy the same properties.) The first two 
properties are those of orthogonality , and can be summa¬ 
rized as Q t Q = I; the last makes the orthogonality special , 
and can be stated as det(Q) = +1, Orthogonality alone 
implies that the determinant must be either +1 or -1, with 
the latter indicating the presence of a reflection in the 
matrix. A 3x3 orthogonal matrix with negative determinant 
can be converted to a pure rotation by factoring out a -I. 

Numerical analysts have developed a number of algorithms 
for orthogonal matrices [Golub 89] [Press 881, in large part 
because orthogonality limits the accumulation of numerical 
error. Given a square—and presumably non-singular— 
matrix, three promising orthogonal decompositions are 
available: QR decomposition. Singular Value Decompo¬ 
sition (SVD), and Polar Decomposition. The QR factors of a 


matrix M = QR are, respectively, orthogonal and lower 
triangular. The SVD gives three factors, M = UKV T , with 1) 
and V orthogonal and K diagonal and positive. The less 
common Polar Decomposition, M =QS. yields an or¬ 
thogonal factor and a symmetric positive definite factor. 
The latter two decompositions can factor singular matrices, 
with “positive” replaced by “non-negative” in the factors. 

More than one algorithm is available to compute each 
decomposition. The oldest and best-known method for QR 
Decomposition is called Gram-Schmidt orthogonalization. 
Each row of the matrix is considered in turn, with each 
divided by its magnitude to give a unit vector, then project¬ 
ed onto the remaining rows to subtract out any parallel 
component in each of them. A better method is to accumu¬ 
late Householder reflections, orthogonal transformations 
which can zero out the elements above the diagonal. 

There is no simple SVD algorithm. The most common 
approach is first to use Householder reflections to make M 
bidiagonal, then to perform an iteration involving QR 
Decomposition until the off-diagonal entries converge to 
zero. While this is numerically reliable, it is complicated to 
code, and by no means cheap. 

It is possible to compute a Polar Decomposition using the 
results of SVD, suggesting great cost; but a simpler method 
is available [Higham 86]. Compute the othogonal factor by 
averaging the matrix with its inverse transpose until 
convergence: Set Q () = M, then Q, + | = '^(Q,* Q, _T ) until 
Q, + l -Q,= 0. This is essentially a Newton algorithm for the 
square root of I, and converges quadratically when Q, is 
nearly orthogonal. Finding the Q factor of a 2x2 matrix is 
easy. Suppose 

-os> 

then 

Q = M + sign(det(M))^ J 
scaled by a factor that makes the columns unit vectors. 

Polar Decomposition Advantages 

Care is needed in choosing among the possibilities, since 
the purposes of numerical linear algebra are different from 
those of computer graphics. The worst of the three choices 
seems to be SVD: it is the most expensive to compute, and 
the orthogonal matrices it produces are practically useless. 
A matrix which is already a pure rotation can be factored in 
an infinite variety of ways into the two orthogonal matrices 
of the decomposition, which is disastrous in the context of 
matrix animation. Small perturbations of the input matrix 
can cause different orthogonal factors to be chosen, even 
though the set of singular values is stable. Interpolating 
unreliable matrices will produce erratic results: consider the 
following two decompositions. 
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Although both are perfectly valid decompositions, inter¬ 
polation of the primitives will give visible distortions 
—not at all what the user expects! Floating point variations 
in the least significant digit can cause an SVD algorithm to 
choose the first decomposition for one key, and the second 
for the next. Many SVD routines order the singular values 
by magnitude, which only exacerbates the problem. There 
seems to be no way to avoid having small input changes 
cause large output changes. 

QR Decomposition is a much better choice, though it still 
presents problems. Unlike the factors of SVD, the QR 
factors can be determined uniquely, and are stable under 
small perturbations. Also, the algorithms for QR are simple 
and efficient. The drawback is that the orthogonal matrix 
extracted is not particularly meaningful; it is not inde¬ 
pendent of the coordinate basis used, and so has no 
“physical” significance. That is, if the matrix M is given in 
a rotated and uniformly scaled basis M'= coordi¬ 

nate independent factors would have the form Q' = BQB" 1 
and R' = BRB~ l ; but the latter is no longer a lower 
triangular matrix, since that property is not preserved under 
similarity transforms. This is unfortunate for animation 
purposes, because it makes results much less predictable. 
Suppose, for example, M is constructed by rotating then 
scaling; although the Q factor might be expected to capture 
the rotation, it will not. Only when M is constructed by 
scaling then rotating will QR recover the original factors. 

The Polar Decomposition factors are unique, coordinate 
independent, and simple and efficient to compute. Further¬ 
more, the orthogonal factor Q is the closest possible 
orthogonal matrix to M, a property which is also coor¬ 
dinate independent. That is, Q satisfies the following 
conditions. 

Find Q minimizing 11 Q-M lip 

subject to Q T Q -1 = 0, 

where the measure of closeness, the Frobenius matrix norm 
squared, is 

II Q-M lip = £ ) 2 . 

u 

Since this important claim appears in [Higham 88J without 
proof, a proof is given in the Appendix. When M has 
positive determinant, Q will be a pure rotation, otherwise it 
will include a reflection. It might seem preferable to exclude 
reflections, but there is no well-defined nearest rotation. For 
example, every 2-D rotation is equally distant from every 
2-D reflection. (Polar Decomposition is applicable to 
matrices of any size and shape.) A rotation has the form 


), with C 2 +.V 2 =I, while a reflection is C-.0 , with 

a 2 +b 2 = I. The sum of the squares of the differences is 
(c-a) 2 +(-s-b) 2 +(s-b) 2 +(c+a) 2 = 2(c 2 +s 2 )+2(a 2 +b 2 ) = 4. As 
noted earlier, however, a 3x3 Q matrix which includes a 
reflection (indicated by a negative determinant) can be 
factored as Q = R(~I). 

Closeness also makes Polar Decomposition good for matrix 
renormalization. Moderate amounts of numerical noise can 
be removed in a single iteration of the averaging algorithm. 
This improves and formally grounds [Raible 90]. 

The combination of uniqueness and closeness guarantees 
that small input perturbations will not produce large output 
variations. The Q factor of Polar Decomposition appears to 
be the best possible rotation. What, then, is the S factor? 
As the appendix shows, in some rotated coordinate system 
S is diagonal—in other words, a scale matrix. This form of 
scaling is preserved through coordinate changes, and has a 
good claim to being a new primitive, stretch. The S factor 
can move to the other side of the Q factor without changing 
form, though its value will change to Q T SQ. Thus Polar 
Decomposition has a very physical interpretation. 




Figure 3. Physical View of Polar Decomposition 

One drawback of Polar Decomposition is that there is no 
explicit representation of shear. As explained earlier, 
interaction is to blame; shear will be factored as rotation 
and stretch. In two dimensions, for example, a simple shear 
will factor as 




h 

2 +/ 7 2 


)->— 

' V 4 +/r 


= QS. 

As Figures 4 and 5 show, the appearance of a factored anima¬ 
tion can be quite different from that of a direct animation for 
shear. Nevertheless, factorization gives a reasonable result. 
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Figured Direct Shear Interpolation 


Direct Stretch Animation 

Although S can be factored into diagonal form, S = UKU T 
(using a symmetric eigenvalue routine [Golub 89] 
[Carnahan 69]), as with SVD the factorization is not unique. 
This unavoidable indeterminacy combined with small 
numerical errors could cause different U’s to be chosen at 
different keys, and the resulting interpolation would suffer 
greatly. Fortunately, however, S matrices can be 
interpolated directly, and will preserve their form and 
meaning. That is, a | S |+a 2 S 2 +... yields a symmetric 
matrix, which for non-negative a,- will also be positive defi¬ 
nite. so it is not necessary to choose a diagonalizing 
rotation U. If some U diagonalizes both Sj and S 2 simul¬ 
taneously, then ctiSi+a 2 S 2 = U(ot|K|+a 2 K 2 )L :T , so direct 
interpolation simply interpolates the scale factors, as 
desired. Weights a, for interpolation will usually include 
negative values (to ensure smooth motion), so the 
interpolated S matrices can become singular; but the same 
thing can happen with pure scale matrices. In both cases 
this does not seem to be a serious problem, and can be 
solved using spline tension. 


0 



Figure 6. Polar Decomposed Matrix Interpolation 
Factored Stretch Animation 

Diagonalization of S as UKU T is still a useful alternative if 
it can be stabilized across keys. (Even without sta¬ 
bilization, an interactive user interface will certainly deal 
with stretch in factored form.) So in this section—which 
can be skipped on first reading—we consider the following 



Figure 5. Decomposed Shear Interpolation 


problem: Given two stretch matrices, Sj and S 2 , 
interpolated in that order, how can their diagonalizing 
rotations, Uj and U 2 , be chosen to be as similar as 
possible? More precisely, if the rotation taking U| into U 2 
is designated by U| 2 = U| T LJ 2 , the problem is to minimize 
the absolute angle of rotation performed by U 12 . Further¬ 
more, so that the results can easily be generalized to a series 
of matrices S,, let Uj be fixed. (Then fix U 2 while minimiz¬ 
ing U 2 3 , and so on. Begin with Uq = I.) 

There are three cases, depending on how many identical 
values occur on the diagonal K 2 . When all three values are 
the same, it is possible to set U 2 = Uj. Uniform scaling is 
common in computer graphics practice, and is easily 
delected by inspection of S 2 , which will already be diagonal 
with identical values. When all three values are different, we 
have 24 choices for U 2 . These are obtained by all axis 
permutations (6), times all axis sign combinations (8). 
achievable by a rotation (divide by 2). When exactly two 
values are the same, we have an extension of the all different 
case: free rotation is allowed around one of the axes. The 
last two cases are discussed more fully below-. 

An easy way to measure the rotation Ui 2 is to convert it into 
a unit quaternion. ([Shoemake 85] introduces unit quater¬ 
nions as a representation of 3-D rotation and discusses how 
to interpolate them.) Its real component is cos(0/2), where 
0 is the total rotation angle. Picking U 2 to maximize the 
quaternion’s real component minimizes the angle. 

There is a quick way to do this maximization. Let q be the 
quaternion corresponding to U )2 . The 24 variations corre¬ 
spond to qp< where p is one of 48 quaternions (including 
both p and -/?) that map the coordinate axes into 
themselves: p - [.v y r vr] can be one of [0 0 0 ±1], 

[0 0 ±1 ± 1 ]/ V 2 , [±1 ±1 ±1 ±l]/2. ora permutation of these. 
The real part of qp is nv^.v / ,-vv / , which is simple 
to maximize because of the simple form of each p. We take 
the absolute values of the components of q , sort them, and 
choose the maximum of either the largest, or half the sum of 

all four, or 1 pfl times the sum of the two largest. Then we 
can work backwards from our choices to deduce the 
corresponding p . 
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If exactly two of KVs values are the same, we have a con¬ 
tinuous optimization. As before, we are free to permute the 
axes, but we have the additional freedom to rotate by any 
angle in the plane of equal scaling. We can arrange for the 
equal values to be the first two, so that a change of 
coordinates rotating around the z axis leaves K 2 unchanged. 
So our problem is to pick p and r to maximize the real 
component of qpr y where p is one of the 48 quaternions 
above and r= [0 0 s r[, with r 2 +.r 2 =l, is a quaternion that 
rotates about the z axis. 

The product of a quaternion \x y z w | with r = [0 0 s c\ is 
[at+v.v yc-xs zc+ws wc-zs]. Choosing v = u'/V vv 2 +z 2 and 

s = -z/V vv 2 +z 2 maximizes the real component to V u ? 2 +z 2 . 
Consequently, the best p is one that maximizes vv 2 +z 2 . Only 
six values of p give essentially different results. These are 
[0 0 0 I], fl 1 1 IJ/ 2 , [1 1 l — 11/2 and each of these times 
[10 0 0 ). Summing the squares of the w and z components 
from the product of q with each of these and subtracting V 2 

gives ±(w 2 +z 2 - 1 ±(xz-wy) and ±(w.v+yz). Choose the p 
corresponding to the largest positive value, and if the 
negative sign was used, post-multiply p by [1 0 0 Oj. 

This method for stabilizing the S decomposition is a greedy 
algorithm. It extends partial solutions at each stage by 
finding an optimal continuation, with no backtracking. 
There is no guarantee that this produces a global optimum— 
a locally inferior choice could possibly be warranted 
because it allows better choices further on that more than 
compensate. However, we can prove the following: 

Proposition: Given a sequence S,-of symmetric, 
positive definite matrices, none of which has a di- 
agonalization with exactly two equal values, the 
greedy algorithm given above picks a sequence of 
rotation matrices U, that minimizes the sum of the 
rotation angles between adjacent rotations. 

The proof depends on two observations. First, the S, with 
three equal values do not affect the sum; and second, the 
axis-permuting rotations p form a group. With this in mind, 
let (pi) be the greedy sequence of permutations, and (P,) the 
optimal sequence. Suppose now that some p k * P k . Then the 
discrcpency 8 = P~ k p k is in the group, and can post-multi¬ 
ply every P h i > k without increasing the angle sutn. For P k 
is replaced by p ky which by definition of the greedy sequence 
gives the smallest angle possible at that step; and none of 
the other angles change, since 8“ l ry*) l t/ r+ j8 has the same 
angle as the original j. So (/;,) is also optimal. 

With double values, however, some greedy sequences are not 
optimal. In mitigation, we point out that floating-point 
arithmetic stands between us and any reliable determination 
of equality of values, and that the additional freedom offered 
by equal values only causes the greedy algorithm to find 
solutions with smaller total rotation. Furthermore, the 
global optimization problem in the general case is a mixed- 


integer programming problem of the sort that is often NP- 
complete. (But we make no claims as to the status of this 
particular problem.) 

Lest this extended discussion leave the wrong impression, 
we point out that diagonalization has not been necessary in 
our experience. The animations achieved by direct S 
interpolation look as good as those using the more 
elaborate procedure. (Also, the code required is much shorter 
than the discussion.) Since the developer of an animation 
system may choose not to introduce our new stretch 
primitive, however, we have offered a reasonable 
alternative. 

Conclusions 

With the assistance of Polar Decomposition, a non-singular 
4x4 homogeneous matrix M can be factored into 
meaningful primitive components, as 

M = PTRNS, 

where P is a simple perspective matrix, T is a translation 
matrix, R is a rotation matrix, N is ±1, and S is a symmetric 
positive definite stretch matrix. The stretch matrix can 
optionally be factored, though not uniquely, as UKU T , 
where U is a rotation matrix and K is diagonal and positive. 
For a 4x3 affine matrix the perspective factor can be 
dropped; and for a 3x3 linear matrix, so can the translation. 
Also, N can be multiplied into S if desired. 

Polar Decomposition produces factors QS which are unique, 
coordinate independent, and both simple and efficient to 
compute. The factors have a physical, visual interpretation 
not found with other decomposition methods. The PTRNS 
decomposition is useful for a variety of purposes, including 
matrix animation and interactive interfaces. It has the minor 
disadvantage that it does not directly represent shear. 
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Appendix 

Theorem: The Polar Decomposition factor Q is the closest 
possible orthogonal matrix to M, with closeness measured 
using the Frobenius matrix norm. That is, Q satisfies the 
following conditions. 

2 

Find Q minimizing II Q~M lip 
subject to Q t Q -1 = 0, 

where 

II Q-M lip = £ ( qij-my ) 2 . 

Proof: Though expressed in matrix terms, the proof 
simply requires Finding the minimum of a quadratic function, 
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which we learned lo do in calculus by finding where the 
derivative is zero. We can express IlMllp as the diagonal 
sum—the trace—of M T M, and incorporate the orthogo¬ 
nality constraint as a linear term using a symmetric 
Lagrange multiplier matrix Y. So, as the reader can verify, 
we can differentiate 

trace [ (Q-M) T (Q-M) + (Q T Q - I)Y] 
with respect to Q and equate to zero to obtain 
2(Q-M) + 2QY =0 
which simplifies to 
Q(I+Y) = M. 

Thus M will be factored as our desired Q times a symmetric 
S = I+Y. 

M = QS. 

This factorization is the Polar Decomposition of M. To use 
it we need to solve for S in terms of M. Since Q T Q = I, we 
must have 

(MS-‘) t (MS- | ) = I. 

A symmetric S has a symmetric inverse, so this simplifies 
to 

s- , m t ms -1 = I, 

and-finally to 

s 2 = m t m. 

Now. M t M is guaranteed to be symmetric and positive 
definite (or semi-definite if M is singular), and so there is a 
similarity transform that makes M T M diagonal, with 
positive (or zero) real entries. This gives the Spectral 
Decomposition of S 2 . 


S 2 = UKU t ; 


U r U = I, 


K = 


*> 

0 

0 


0 0 ^ 

k 2 0 

0 Kj j 


Kf > 0 . 


Taking either the positive or negative square root of each 
diagonal element of K, we obtain eight candidates for S, 



However, for Q to be a minimal solution, the second 
derivative. 2(I+Y) = 2S, of our function must be positive 
definite, which means only the positive square roots are 
allowed, and so S is uniquely determined. For any M which 
is non-singular, Q is also uniquely determined. 
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Abstract 

The automatic generation of motion for animation re¬ 
mains an unsolved problem in computer graphics. One 
approach to the problem is to combine physically ac¬ 
curate models with control systems. The user speci¬ 
fies high-level goals and the control system computes the 
forces and torques that the simulated muscles or motors 
should exert to cause the model to perform the desired 
task. In this paper we describe control systems for rigid- 
body models of humans performing four tasks: pumping 
a swing, riding a seesaw, juggling, and pedaling a uni¬ 
cycle. We designed the control systems with the goal 
of producing natural-looking motion, and we discuss the 
techniques that we used to achieve this goal. 

Keywords: Animation, Simulation, Control Theory. 

Introduction 

We would like to be able to automatically generate 
natural-looking motion for computer animations based 
on high-level input from the user. We have explored 
one solution to this problem: combining control systems 
with physically accurate models. The user specifies a 
high-level goal (“ride the unicycle from here to there”) 
and the control system computes the forces and torques 
that will cause the simulated model to perform the de¬ 
sired task. The combination of a carefully designed con¬ 
trol system and a realistic physical model can produce 
natural-looking motion that has much in common with 
the motion of the animal or human on which it was mod¬ 
eled. 

Physical simulation has been used successfully for gen¬ 
erating realistic motion of passive systems (Barzel and 
Barr 1988; Hahn, J. 1988; Terzopoulos and Fleischer 
1988; Terzopoulos and Witkin 1988; Baraff 1989, 1991; 
Pentland and Williams 1989; Kass and Miller 1990; Nor¬ 
ton et al 1991; Wejchert and Haumann 1991). Passive 
systems are those that are acted upon by the environ¬ 
ment but have no internal source of energy. Bouncing 
balls, leaves blowing in the wind, and raindrops falling 
into puddles are all passive systems. To the extent that 
the computer program accurately models the physical 
system and the environment, the resulting motion will 
be natural. In contrast to passive systems, the systems 

tPresent address: Department of Aeronautics and Astro¬ 
nautics, Stanford University, Stanford, CA 94305. 


described in this paper, active systems , contain simulated 
muscles or motors that provide an internal source of en¬ 
ergy and allow the systems to act on the environment. 
Humans, animals, robots, and vehicles are all active sys¬ 
tems. Simulation of active systems requires not only a 
physically realistic model of the system being animated 
but also a control system or computer algorithm that 
activates the muscles or motors in such a way that the 
system performs the desired task. 

Because the internal workings of biological control sys¬ 
tems are much less well understood than the physical 
systems themselves, the simulation of active systems is, 
in general, more difficult than that of passive systems. 
Active systems have been animated by using springs and 
dampers as the control system (Miller 1988), by program¬ 
ming a control system for a simplified model of the sys¬ 
tem being animated (Bruderlin and Calvert 1989), and by 
simulating the actions of the oscillators found in simple 
biological control systems (McKenna and Zeltzer 1990). 

The design of control systems has not yet been au¬ 
tomated for systems of the complexity of those that we 
would like to animate. Others have begun to address 
the question of automatic generation of motion in the 
context of optimization problems and optimal control 
theory (Witkin and Kass 1988; van de Panne, Fiume, 
and Vranesic 1990). The potential generality of these 
approaches makes them among the most interesting new 
methods for animation of dynamic systems. The poten¬ 
tial liability is the growth of the search spaces when ap¬ 
plied to more complex systems. 

Our approach to hand designing these control systems 
builds on previous work in the control of legged robots 
(Raibert and Hodgins 1991; Hodgins 1991; Hodgins and 
Raibert 1990; Raibert 1986). This work provides us with 
a number of techniques that aid in the design of control 
systems for dynamic tasks: state machines for structur¬ 
ing the control laws, low-level control through springs and 
dampers, and symmetry of the motions as a principle for 
the design of the higher level control algorithms. 

In this paper we describe control systems for rigid- 
body models of humans performing four tasks: pumping 
a swing, riding a seesaw, juggling, and pedaling a uni¬ 
cycle. In each case, the simulated models are composed 
of rigid links connected by rotary or sliding joints. The 
models are derived from measurements of living humans 
and cadavers (Meredith 1969a,b; Dempster and Gaugh- 
ran 1965). Each model has enough degrees of freedom 
to perform the stipulated task but only a small fraction 
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Figure 1: The user specifies the desired behavior at a high 
level. The control algorithms compute forces and torques that 
should be applied at each joint based on the state of the sys¬ 
tem and the desired behavior. The internal forces at the joints 
and external forces from the environment are applied to the 
links of the model. The equations of motion are integrated 
forward in time to produce the new state of the system. 


of the number found in humans. The equations of mo¬ 
tion for each model were generated with a commercially 
available package (Rosenthal and Sherman 1986). The 
package generates efficient subroutines for the equations 
of motion of the model (O(n) where n is the number of 
links) using a variant of Kane’s method and a symbolic 
simplification phase. 

A state machine provides the underlying structure for 
the control system for each animation but allows the con¬ 
trol laws to change when the dynamics of the system or 
the user’s commands change. For example, the dynam¬ 
ics of the seesaw change when the legs of a rider touch 
the ground and the control laws must change to match. 
Similarly, the control laws change when the user switches 
from controlling the position to controlling the velocity of 
the unicycle. Each state has specific control laws and the 
transitions between the states are determined by changes 
in the state of the system or by changes in the input to 
the control system. 

Proportional-derivative control is used for low-level po¬ 
sition control in most of the simulations. The torque ex¬ 
erted at a joint is a function of the error in position and 
the relative velocity between the links on either side of 
the joint: 

t = kp(0 - 0^) + kv0 

where r is the control torque, 0 is the relative joint angle 
between the links, 0 d is the rest position of the spring, and 
0 is the relative velocity between the links. The gains, k p 
and k v depend on the mass of the links and the desired 
stiffness and damping of the joint. This servo has the 
same effect on the system as a spring and damper where 
the rest position of the spring is controlled by the control 
system. 

Animations are produced through high-level interac¬ 
tions with the control system. For example, the user 
specifies the forward speed of the unicycle, how long the 
juggler should use one pattern before switching to an¬ 
other, or the maximum height the swing will achieve. At 
each simulation time step, the control system computes 
forces or torques for each joint based on the state of the 
system and the requirements of the task. The equations 
of motion of the system are integrated forward in time, 
and the resulting motion is displayed in a simple graph¬ 
ical model and recorded for later use in an animation. 
The layout of the animation system is shown in figure 1. 



Figure 2: Model used to animate the motion of a human 
pumping a swing. All joints rotate about the y axis and the 
motion of the model is constrained to the x-z plane. The an¬ 
gle of the swing, the knee and the elbow are marked to aid in 
the interpretation of figure 5. 


The details of the individual models and control systems 
are described below. 

Pumping a Swing 

To pump a swing the control actions of the human must 
be coordinated with the fore-aft motion of the swing. Fig¬ 
ure 2 shows the model used to simulate a human pumping 
a swing. The links of the model— cylinders, truncated 
cones, and ellipsoids—are connected by rotary joints at 
the wrists, elbows, shoulders, waist, hips, knees, and an¬ 
kles. The swing is modeled as two rods connected by a 
pivot joint located where the hands attach to the swing. 
A second pivot joint attaches the bottom of the lower 
body to the end of the lower swing rod. This joint rotates 
freely, and the angle between the body and the lower rod 
of the swing is controlled by the elbow, shoulder, and 
wrist joints. The grip of the hands on the swing joint 
is modeled by a pair of orthogonal springs and dampers. 
The motion of the swing and the human are constrained 
to the fore-aft plane. 

Swinging has been studied for two simple models: a 
point mass that slides up and down a rigid rod and a 
system that switches between a double and a compound 
pendulum (Tea and Falk 1968; Burns 1970; Gore 1970; 
Gore 1971; McMullen 1972). These models are shown in 
figure 3. Our model is more complex than either of these, 
but the simpler models provide some physical intuition 
about how the control system can move the joints to 
increase the amplitude of the swinging motion. 
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Point Mesa on a Rigid Rod 


Double Pendulum Compound Pendulum 


Figure 3: Two simplified models of pumping. The leftmost 
model, a point mass on a rigid rod, can be pumped by sliding 
the mass up the rod at the lowest point of the cycle and down 
at the highest in a pattern like that shown by the dashed 
line. This pumping action decreases the moment of inertia 
of the system when all the energy is kinetic and increases it 
when all the energy is potential. The double/compound pen¬ 
dulum model increases the amplitude of the swing by letting 
the second pendulum fall backwards at the highest point of 
the swing. When the fall of the second pendulum is arrested 
and the model becomes a compound pendulum, the angular 
velocity of the first pendulum is increased causing the swing 
to go higher on the next cycle. The motion of a human on a 
swing is similar to the motion of the double/compound pen¬ 
dulum in that the arms are relaxed at the back of the swing 
so that the body falls backward and acts like a double pendu¬ 
lum until it is caught by the extended arms. When the arms 
stop the motion of the body, the system becomes a compound 
pendulum again but with increased angular velocity. 


The state of the control system for pumping depends 
on the angle and velocity of the swing. The transitions 
between the states occur when the swing passes through 
the lowest point of the cycle and when it nears the highest 
point. The state machine is illustrated in figure 4. The 
control laws for each state specify the desired angles and 
the gains for each servo. 

The control system uses a proportional-derivative 
servo or spring/damper system to control each joint. 
When the swing is moving forward the desired angles 
cause the body to lean back and the legs to extend. As 
the swing moves backwards, the desired angles cause the 
arms to move the body forward towards the lower rod 
and the legs to bend. The top graph of figure 5 shows 
the angle of the swing as the control system pumps for 
twenty seconds, coasts for ten seconds, and pumps again 
for ten seconds. The middle graph shows the angle of the 
knee joint as the legs are swung back and forth. The bot¬ 
tom graph shows the angle of the elbow joint as it pulls 
the body forward and allows the body to fall backwards. 



Figure 4: State machine used to control the pumping mo¬ 
tion. The state is determined by the angle and velocity of 
the swing. The state behind pivot and traveling forward 
has the same control laws as the state ahead of pivot and 
traveling forward and serves only to prevent false transi¬ 
tions before the swing has reached its maximum forward po¬ 
sition. For the same reason, ahead of pivot and traveling 
backwards has the same control laws as behind pivot and 
traveling backwards. 


The control for the seesaw uses two independent state 
machines, one for each rider. The seesaw with two riders 
is much like a quadruped bounding in place, and the con¬ 
trol algorithms are similar to those used for the control of 
a bounding quadruped (Raibert and Hodgins 1991). The 
state machine for the control of the seesaw is shown in 
figure 7. The transitions between the states occur when 
the legs touch or leave the ground. 

The control laws consist of proportional-derivative ser¬ 
vos for each joint. The setpoints and gains depend on the 
state. During flight, the leg is moved to an appropriate 
position for touchdown. During compression, a spring 
at the knee joint stores energy and causes the system 
to bounce passively. During extension, the knee is ex¬ 
tended to add energy to the bouncing oscillation of the 
system. The height of the oscillation can be varied by 
changing the extension of the knee. 


Riding a Seesaw 

The seesaw animation has two riders on opposite ends of 
a plank. The control system varies the rotation of the 
plank by changing how hard each of the riders pushes 
off against the ground. The model for the seesaw ani¬ 
mation is shown in figure 6 . The collision model for the 
leg and the ground consists of two orthogonal pairs of 
springs and dampers. The springs are stretched between 
the touchdown and current positions of the end of the 
leg. When the lower end of the leg leaves the ground, the 
springs are disconnected. Like the hands in the swing 
model, the lower arms are attached to the handles of the 
seesaw with springs and dampers. 


Juggling 

To animate juggling, the control system moves the wrist, 
shoulder, and elbow joints of a model so that the hands 
catch and throw balls. The user directs the animation by 
specifying the juggling pattern (cascade, shower, or foun¬ 
tain) and the length of time that the balls are held in the 
hand (dwell time) or flying through the air (flight time). 
The control system perturbs the throws so as to main¬ 
tain the stability of each pattern and produce transitions 
between the patterns. 

Figure 8 illustrates the model used in the juggling sim¬ 
ulation. The hands are not anthropomorphic but are of 
approximately the same size and density as human hands. 
Collisions are detected between each ball and the surfaces 
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time (s) 

Figure 5: The top graph shows the angle of the top rod of the 
swing with respect to vertical. The control system pumped 
the swing for twenty seconds, coasted for ten seconds, and 
pumped again for ten seconds. The vertical dashed lines indi¬ 
cate when the task changed from pump to coast. The middle 
graph shows the movement of the knee joint as the legs are 
swung back and forth. The bottom graph shows the elbow 
joint as it pulls the body forward and allows it to fall back¬ 
wards. The dashed lines in the lower two graphs indicate the 
desired angle for the knee and elbow. The elbow is pulled 
away from the desired angle by the weight of the body when 
the body is leaning back. 


of the fingers, thumbs, and palms and are modeled by a 
spring normal to the plane of the surface and by two 
dampers aligned with the surface and perpendicular to 
each other. Collision forces are applied to both the hand 
and the ball when the ball is in contact with the hand. 

Control System 

The basis of all juggling patterns are accurate throws and 
robust catches. The control system causes the ball to be 
thrown by generating a desired trajectory for the hand 
in cartesian coordinates that will accelerate the ball and 
the hand to the desired velocity by the time they reach 
the release point. The control system computes torques 
that will cause the hand to match the speed of the ball as 
it falls and then close around the ball after contact. This 
method of catching was implemented for a one degree 
of freedom juggling robot by Biihler, Koditschek, and 
Kindlmann (1989). 

The control system has four states: meet, deceler¬ 
ate, accelerate, and follow. In each state the control 
system generates a desired trajectory for the hand that 
will cause it to move to the desired position and arrive 
with the desired velocity and acceleration. Using inverse 
kinematics, the desired hand position is transformed to 
desired positions for the joints of the upper and lower 
arm and wrist. Proportional-derivative control coupled 
with a simplified version of the forward dynamics cause 



lower 



Figure 6: The model for the seesaw animation. The ground 
model is two orthogonal pairs of springs and dampers. The 
ends of the arms are also attached to the handles of the seesaw 
with springs and dampers. 


each joint to track the trajectory. 

During the meet state, the hand moves up to meet the 
falling ball. The control system generates a polynomial 
trajectory that will cause the hand to meet the ball at 
the desired catching position with a velocity that matches 
the velocity of the ball and an acceleration equal to grav¬ 
ity. Matching the descent velocity of the ball reduces the 
chance that the ball will bounce out of the hand before 
the finger and thumb close to constrain it. 

When the ball is in the hand, the control system is 
in either the decelerate or accelerate state. During 
these states the control system chooses a trajectory that 
reverses the motion of the hand and ball in the r direction 
and moves it towards the desired release position. When 
the hand nears the release position, the finger and thumb 
open and the hand follows the ball briefly (0.05 sec) in x 
and y while decelerating in z to prevent collisions which 
would disturb the trajectory of the ball. After the ball 
leaves the hand, the control system generates a trajectory 
that will cause the hand to meet the next ball. 

Juggling Patterns 

Jugglers commonly use three different three-ball pat¬ 
terns: the cascade, shower, and fountain (Buhler and 
Graham 1984; Beek 1989). The three patterns are shown 
in figure 9. In the cascade, each hand throws the balls 
across to the other hand and each throw passes under 
the arriving balls. In the shower, the balls move roughly 
in a circle with the throw from one hand passing above 
the throw from the other hand. In the fountain, each 
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Figure 7: The state machine for one seesaw rider. The transi¬ 
tion from flight to compression occurs when the leg touches 
the ground. The transition from compression to exten¬ 
sion occurs when the knee joint begins to extend. The transi¬ 
tion from extension to flight occurs when the leg leaves the 
ground. 


hand juggles separately, throwing and catching the balls 
without passing them to the other hand. 

To produce these three juggling patterns we chose re¬ 
lease positions, catch positions, and flight times for each 
pattern. The catching and throwing routines were the 
same for all patterns. 

Transitions between Patterns 

The control system performs transitions between pat¬ 
terns by waiting until the first ball in a pattern is caught 
and then setting the throw and catch positions and the 
flight times to those used in the new pattern. Two feed¬ 
back laws make the transitions robust: phase correction 
and collision avoidance. 

The phase of each ball is the time in the cycle at which 
it is caught. In a three-ball cascade or shower, catching 
the first ball signals the beginning of a cycle, the second 
ball is caught 1/3 of the way through the cycle, and the 
third 2/3 of the way through. Phase correction is per¬ 
formed by adjusting the dwell time so that the next time 
the ball is caught it will be closer to the correct phase. 
In the fountain pattern, the two hands operate indepen¬ 
dently and the phase correction algorithm is used to keep 
both the balls and the hands operating in phase. 

The feedback law for collision avoidance is used when 
errors in phasing or the changing patterns of the throws 
would cause two balls to collide. On each throw, the 
control system uses the ballistic equations for the balls 
to predict if the ball in the hand will collide with either 
of the balls in the air. If a collision is expected, the throw 
position is moved by twice the ball radius in x to prevent 
the collision. 

Balancing on a Unicycle 

To maintain balance a unicycle rider must push on the 
pedals in such a way as to correct errors in balance and 
forward speed. The user directs the animation by speci¬ 
fying a desired velocity or a desired position. 



Figure 8: The rigid-body model used in the juggling anima¬ 
tion. The model has eleven links with a total of fifteen degrees 
of freedom. The parameters for the mass, moment of inertia, 
and dimensions of all the links except the hands were obtained 
from Dempster and Gaughran (1965). The hands are of ap¬ 
proximately the same size and density as the human hand 
measured in Dempster and Gaughran (1985). 


The model contains a wheel, a seat, and a body with 
two legs (figure 11). The legs push on the pedals through 
a pair of orthogonal springs and dampers. The springs 
allow the legs to push down and sideways on the pedals 
but not to pull up. The contact model between the wheel 
and the ground is also a pair of springs and dampers. 
This model allows the wheel to roll but does not allow it 
to slip. 

Control 

Unlike the other animations described in this paper, the 
unicycle control problem is continuous and there are no 
impacts that cause the dynamics of the system to change. 
The unicycle wheel is always touching the ground, and 
the control laws do not depend on the state of the system. 
As a result, the state machine is used only to handle 
changes in the control laws when the user switches from 
position control to velocity control. 

The desired torque at the wheel is a function of the 
error in forward speed and the angle of the stem: 

r = k<p<j> -f + kx(id — *) 

where r is the desired torque at the wheel, <f> is the angle 
of the stem relative to vertical, <j> is the velocity of the 
angle of the stem, id is the desired speed of the unicycle, 
x is the actual speed, and and kx are gains. 

There is no motor at the hub, and the desired torque 
at the wheel is produced by moving the hip and knee 
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horizontal ball position (m) 




horizontal ball position (m) 

Figure 9: Ball motion in cascade, shower, and fountain pat¬ 
terns. 


joints so that the legs press down on the pedals with the 
appropriate force. Each leg is assigned a percentage of 
the desired torque based on the sign of the desired torque 
and the current pedal position: 

weight = i + i sin 0 

The other leg has a weighting of 1 — weight. When 
the torque is positive and the pedals are horizontal 
(9 = ±7t/2) the front leg has a weighting of one and 
the rear leg has a weighting of zero. When the pedals are 
vertical, both legs have a weighting of one-half and the 
force is divided evenly between the two legs. The desired 
force along the axis of the leg is increased by a preload 
to ensure that the legs always push down on the pedals 
and remain on the pedals. 

The desired force at the pedal is converted to desired 
torques at the knee and hip by taking the kinematics of 



Figure 11: Schematic drawing of the model used in the uni¬ 
cycle animation. The model consists of a wheel, a scat, and 
a body with two legs divided into upper and lower segments. 
The body is attached to the seat by a pivot joint. The in¬ 
teraction between the lower legs and the massless pedals is 
modeled by a pair of orthogonal spring/dampers. The springs 
allow the legs to push down and sideways on the pedals but 
not to pull up. The ground model for the unicycle wheel is a 
pair of orthogonal spring/dampers. The x spring is stretched 
between the point marking the distance that the wheel has 
roiled and the point on the ground where the wheel is touch¬ 
ing. The point to which the wheel has rolled is xo + 2 irrO 
where xq is the starting location, r is the radius of the wheel, 
and 0 is the number of revolutions of the wheel. The point 
at which the wheel touches the ground is assumed to be the 
point directly beneath the hub. This model does not allow the 
wheel to slip. 


the leg into account and assuming massless legs: 


^fcnce — U fx 


Thtp — —ul(sin(a)/* + cos(cv)/*) - 11/* 

where Tknee is the desired torque at the knee, r/,. P is the 
desired torque at the hip, 11 is the length of the lower leg, 
ul is the length of the upper leg, a is the angle of the 
knee, and /* and f z are the desired forces between the 
pedal and the lower leg in the coordinate system of the 
lower leg. 

The user interacts with the animation of a unicycle 
rider by specifying a desired speed or a desired position. 
The desired position is converted to a desired speed: 

x d = k x (x - x d ) 

where x d is the desired speed, x is the current position, 
x d is the desired position, and k x is a gain. The desired 
speed is limited by a maximum desired speed. Figure 12 
shows the behavior of the system as it starts from rest 
and cycles to a specified location. 

The control for the unicycle simulation has constant 
gains. Vos (Vos 1989; Vos and von Flotow 1990) states 
that the gains must be adjusted based on the state of 
the system for good performance of a three-dimensional 
robot unicycle with a motor at the hub and a horizontal 
turntable for yaw control. When the simulation is ex¬ 
tended to three dimensions and the operating range of 
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Figure 12: The top graph shows the position of the unicy¬ 
cle as it moves towards the goal. The bottom graph shows the 
forward speed. The dashed lines indicate the desired position 
and velocity. 

the simulation is extended, we may also find this to be 
true but constant gains provide adequate performance for 
the two-dimensional model. 

Discussion 

We have presented descriptions of the models and control 
systems for four animations: a swing, a seesaw, a juggler, 
and a unicycle. In each case, the control system is built 
upon a state machine and the transitions between the 
states are determined by changes in the state of the sys¬ 
tem or changes in the input commands. The current state 
determines which control laws should be used to control 
the system. The interactions with the environment and 
many of the low-level control laws are implemented with 
springs and dampers. The state machines and some of 
the control laws are hand designed for each animation. 

If properly designed, a control system will produce 
natural-looking motion when it is used to activate a phys¬ 
ically realistic model. The combination of a physically 
realistic model and a control system can easily produce 
bad motion too, by violating joint or torque limits or by 
performing the task in an unexpected way or with ex¬ 
traneous motions. This problem is particularly apparent 
in underconstrained problems like the swing. There are 
many ways to pump a swing but only a small fraction 
produce motion that resembles the movements made by 
a human pumping a swing. 

Our experiments during the design of these animations 
suggest that several features are required for the genera¬ 
tion of natural-looking motion: 

• The model must contain the key features of the sys¬ 
tem being modeled. If the model is too simple, 
the motion may not appear natural. We originally 
modeled the swing as a single rigid rod (no joint 
where the hand is attached) and discovered that the 
setpoints had to be adjusted very carefully for the 
swinging motion to increase in amplitude. With the 
extra degree of freedom, the control is much more 
robust and the motion appears more natural. We 
also compared a unicycle animation that was pow¬ 
ered by the motion of the legs with one in which 


the control was provided by a torque source at the 
wheel and the legs were positioned kinematically. 
The resulting motions differed because the motor 
could exert a uniform torque in all pedal positions 
while the legs could not. 

• The model should include the control parameters. 
In designing and tuning a control system, we found 
it easy to set the gains too high so that the sim¬ 
ulation produced jerky motion. A strength model 
should be part of the physical model and control 
commands that require the physical system to go 
beyond those limits should be filtered. Lee et al 
(1990) implemented such a system for the task of 
lifting a load. 

• Biological data can provide information about set- 
points and control actions. In an underconstrained 
problem like the swing, choosing the control actions 
is not easy. A wide range of actions cause the swing 
to gain amplitude but only a much smaller range are 
commonly used by humans on a swing. We studied 
measurements of people pumping a swing to learn 
about the joint angles that were used and the tran¬ 
sition points of the control system. More rigorous 
comparison of simulation data with biological data 
can also be used to show to what extent the simu¬ 
lated motion resembles the biological motion. 

• Perfectly smooth motion is not natural. Steady- 
state simulations often produce motion that is too 
repetitive. Each of the phase plots of juggling shown 
in figure 9 contains data from several repetitions of 
the pattern, but there is little variation from one 
toss to the next. This too-perfect motion appears 
“robot-like” when played back through a graphical 
model. We need to take advantage of all oppor¬ 
tunities for adding interest to the motion: changes 
in high-level commands from the user, disturbances 
from the environment, and the addition of noise 
within the system. 

We have not addressed the question of automatic gen¬ 
eration of control systems. Large portions of each of the 
control systems described here were designed by hand 
with the aid of principles gleaned from our experience in 
controlling robots. We are interested in exploring other 
approaches to the problem of generating control systems, 
perhaps by formalizing the principles that were outlined 
here to allow the automatic generation of control systems 
for limited domains or by applying techniques from op¬ 
timal control theory to design the control systems with 
less input from the human designer. 

By approaching the problem of motion generation from 
the perspective of physical realism, we have gained rules 
that make it possible to automatically generate motion 
through simulation. Within this framework we can gener¬ 
ate many different motions that accomplish a single task 
by varying the physical system and the control strategies. 
For example, an adult pumps a swing differently than the 
eight-year-old child we modeled and a skilled unicyclist 
would use different gains and perhaps even different con¬ 
trol laws than a beginner. Even greater variety could 
be produced by violating physical laws. In hand-drawn 
animations emotion and humor are often communicated 
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through exaggerated motions that violate physical laws. 
Eventually, we would like to take advantage of this part 
of the design space; however, we feel that a better under¬ 
standing of techniques for the generation of physically 
correct motion is needed first. 
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Abstract 

The use of physical system simulation has led to realis¬ 
tic animation of passive objects, such as sliding blocks 
or bouncing balls. However, complex active objects like 
human figures and insects need a control mechanism to 
direct their movements. We present a paradigm that 
combines the advantages of both physical simulation and 
algorithmic specification of movement. The animator 
writes an algorithm to control the object and runs this 
algorithm on a physical simulator to produce the anima¬ 
tion. Algorithms can be reused or combined to produce 
complex sequences of movements, eliminating the need 
for keyframing. We have applied this paradigm to control 
a biped which can walk and can climb stairs. The walking 
algorithm is presented along with the results from testing 
with the Newton simulation system. 

CR categories: 1.3.7 [Computer Graphics]: Three di¬ 
mensional graphics and realism - animation; 1.6.3 [Sim¬ 
ulation and Modeling]: Applications 

Keywords: physical simulation, human figure anima¬ 
tion, animation control, constraints, dynamics 

Introduction 

This paper describes a paradigm for the control and ani¬ 
mation of complex active objects such as the human fig¬ 
ure. In this approach the animator develops an algorithm 
which controls the object by specifying certain intuitive 
variables as a function of time and of world state. The 
algorithm is able to continuously monitor the world state 
as it is being automatically updated by an underlying dy¬ 
namics simulation system, and the algorithm is able to 
react when it sees changes in the world state. 

For example, in the case of biped walking, the animator 
might write an algorithm that controls the angle of the 
knees at one point in the animation, and that controls 
the trajectory of the foot at another point. The algorithm 
might monitor the world state and, when it notes an event 
such as a foot touching the ground, stop controlling the 
trajectory of the heel and start controlling the angle of 
the knee. 


Most animators would probably be comfortable with 
the idea of “programming” a human figure to walk. The 
algorithmic approach to animation allows this to be done 
with ease. This is demonstrated by the walking algorithm 
presented below. 

Other Dynamics Work in Graphics 

In some of the first work in this area, Armstrong and 
Green [1] present the equations of motion for tree- 
structured linkages of rigid bodies and discuss an efficient 
method of solving them. 

Witkin and Kass [27] have combined physical sim¬ 
ulation and keyframing to produce realistic animation of 
their jumping Luxo lamp. With their approach the an¬ 
imator uses spacetime constraints to specify several key 
points for selected variables at specific times. Combin¬ 
ing spacetime constraint equations with the Lagrangian 
equations of motion and discretizing over time yields a 
system of equations that are solved to produce the mo¬ 
tion. Our algorithmic approach differs in that the con¬ 
straints can be added or removed “on the fly” as the 
algorithm sees changes in the world state which might 
not be predictable. 

Herr and Wyvill [15] describe a dynamics simulation 
system which allows easy user control through a simula¬ 
tion language and several high level control primitives. 
Our work is similar in that the user can define and control 
arbitrary variables, but we concentrate more on develop¬ 
ing algorithms to control complex objects in an intuitive 
manner. 

van de Panne, Flume, and Vranesic [25] build 
state-space controllers to provide control torques that 
achieve desired goal states from arbitrary initial states. 
Such controllers can be concatenated to produce move¬ 
ment, including cyclic movement like walking. 

Other approaches to combine control and physical sim¬ 
ulation have been explored: Wilhelms [26] and Barzel 
and Barr [3] blend kinematic and dynamic analysis, 
Moore and Wilhelms [22] and Baraff [2] discuss the 
collision and contact problems, Isaacs and Cohen [18] 
incorporate inverse dynamics in their simulation system, 
and Brotman and Netravali [5] use dynamics and op¬ 
timal control to interpolate between key frames. 
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Other Work in Walking and Control 

The algorithmic approach is meant as a general method 
by which to control complex mechanisms. In this paper 
we use the walking problem as an example of an applica¬ 
tion of the algorithmic approach. Some other approaches 
to walking are briefly described here. 

Kearney, Hansen, and Cremer [19], in an ap¬ 
proach very similar to ours, examine the control of me¬ 
chanical systems as a constraint programming problem. 
Bruderlin and Calvert [6] have developed an effective 
goal directed approach to dynamic walking in which the 
animator specifies a few high-level walking parameters. 
McKenna and Zeltzer [20] develop a gait controller 
and low-level motor programs to generate legged motion. 
Zeltzer [28] analyzes various approaches to the control 
of complex animated objects and considers their integra¬ 
tion. Raibert and Hodgins [24] describe control sys¬ 
tems for several legged creatures. Brooks [4] produces 
complex walking behavior in a physical, insect-like robot 
from a distributed network of low-level finite state ma¬ 
chines. 


Other Work in Robotics 

Some further insights on control can be gained from ex¬ 
amining the literature in the field of robotics. While this 
field deals with controlling real, physical objects, some of 
the techniques can be applied to animation. 

Researchers in robotics have taken various approaches 
to reduce the complexity of control programs for physi¬ 
cal objects. The computed torque method for robot arms 
(see Craig [8]) can be viewed as simplifying control by 
reducing the gripper to a unit mass. The control pro¬ 
gram can ignore the dynamics of the robot arm, only 
concerning itself with the position of the end effector as 
a function of time. 

In building his one-legged hopping machine, Raib¬ 
ert [23] partitioned control along three intuitive degrees 
of freedom: hopping, forward speed and body posture. 
This resulted in surprisingly simple control programs for 
the hopping robot. For multi-legged machines, Raibert 
introduced the idea of a “virtual leg” which was defined 
in terms of the robot’s physical legs. This again led to 
simplified control programs. 

Both the computed torque method and Raibert’s vir¬ 
tual leg demonstrate that a proper choice of control vari¬ 
ables can lead to simplified control programs. The prob¬ 
lem with this approach is that there is often no simple 
closed-form mapping of these control variables onto the 
forces and torques needed to control the object. In some 
cases a complete system of equations must be numerically 
solved to make this mapping. This is called “inverse dy¬ 
namics” and is typically rejected by robotics researchers 
as being too expensive to use in real-time control. For 
the purposes of animation, however, it is ideal. Our ap¬ 
plication of inverse dynamics will be described in the next 
section. 


The Algorithmic Approach 

In the algorithmic approach, the animator’s algorithm 
selects a small set of intuitive variables with which to 
control the object over the course of the simulation. The 
algorithm can control predefined variables, such as the 
forces and torques at the joints, or the instantaneous 
translational and rotational accelerations of the various 
components of the object. The algorithm can also control 
variables that it defines as linear combinations of these 
predefined variables. 

For example, the algorithm could, with the appropri¬ 
ate subroutine call to the underlying simulation system, 
define the acceleration of the center of mass of an object 
as 1 

Ucm “ ^ ^ m t * &t , 

t 

where m, is the mass of the i th component of the object (a 
constant), and a, is the translational acceleration of the 
i th component. Then at each time step of the simulation, 
the algorithm could supply a value for a cm - 

The underlying simulation system, called Newton , is a 
general purpose physical simulator. Given a description 
of a complex object (in, say, a computer file), Newton 
will automatically generate the corresponding system of 
Newton-Euler equations of motion which describe the in¬ 
stantaneous behavior of the object. Neivton can then 
integrate these equations of motion over time to pro¬ 
duce the animation. Newton also automatically updates 
the system of equations as kinematic relationships in the 
simulation change (one such change would occur as the 
biped’s foot touches the ground). 

The animator’s algorithm interacts with Newton in the 
following ways: 

• The algorithm can add arbitrary equations and vari¬ 
ables to Newton s system of motion equations. In 
the example above, the algorithm added a variable, 
a C m, and a equation defining that variable in terms 
of other variables of the system. The algorithm can 
remove equations that it previously added to the 
system of motion equations. 

• The algorithm can set the value of a variable at any 
time step of the simulation. In the example above, 
the algorithm could supply a value for the a C m vari¬ 
able at each time step. 

• It may be that the algorithm manipulates Newton's 
system of motion equations such the system becomes 
underconstrained, admitting many solutions. In this 
event, the algorithm can tell Newton, through an 
appropriate subroutine call, to select a motion that 
instantaneously minimizes some quadratic function 
of the variables of the system. 

• The algorithm can observe the world state and act 
upon it. For example, a walking algorithm might 
observe that the heel has touched the ground and 
react by moving into a new state of its execution 
(like a finite-state machine). 
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At each time step of the simulation, Newton evaluates 
the current system of equations to determine values for 
any unknown variables, including the translational and 
rotational accelerations of the individual components of 
the object. Newton integrates these accelerations to pro¬ 
duce the state at the next time step, and this process is 
iterated. 

Format of a Control Algorithm 

A control algorithm can be considered as a set of finite 
state machines. Each machine has an initial state and a 
transition between states is made when some user-defined 
predicate become true. 1 

In the algorithm of Figure 1 there is a single ma¬ 
chine having initial state START and having one transi¬ 
tion START -> CM-ACCEL. The transition is made imme¬ 
diately, and defines a new unknown variable, r cm , causes 
an equation 2 to be added to the system of motion equa¬ 
tions, defines a function / which will be called whenever 
Newton needs a value for f cm , and defines a quadratic 
minimization function. Note that the object which is be¬ 
ing simulated must be defined elsewhere. 

initial-states { START } 
transition START -> CH-ACCEL ehen TRUE 
begin 

new-unknown " r cm " 
add-equation M r cm = X^ 771 * ^ " 

add-function " r cm = /( time ) " 
add-quadratic " Q = " 

end 


Figure 1: A Simple Control Algorithm 


Overview of Newton 

The walking algorithm described in this paper has been 
designed and tested using the Newton simulation sys¬ 
tem [9] developed at Cornell University. The devel¬ 
opment of Newton was inspired by the need for more 
general-purpose, flexible simulation systems. 

Extensive mechanical engineering research has led to 
many developments in physical system simulation. The 
ADAMS [7] and DADS [14] systems are examples of large 
state-of-the-art systems from the mechanical engineering 
domain. Such systems are sophisticated in many ways: 
they support efficient formulations of mechanism dynam¬ 
ics, they use fancy numerical techniques for solving equa¬ 

1 For the sake of clarity the algorithms will be described in 
a Pascal-like notation (however, they are currently written in 
Lisp). 

2 We use quotation marks to indicate that the actual equa¬ 
tions must be represented in some internal manner. 


tion systems, they often handle object flexibility and elas¬ 
ticity, and so on. The recent work by graphics and anima¬ 
tion researchers (discussed above) has generally been less 
sophisticated but has placed greater emphasis on anima¬ 
tion of interesting high-degree-of-freedom mechanisms. 

Still, none of these systems combines the full range of 
features required to make dynamics simulation as power¬ 
ful and useful as it could be. Typically they have almost 
ignored geometric considerations and represented objects 
simply as point masses with associated inertias and co¬ 
ordinate systems. Geometric modeling techniques have 
matured enough to allow object representations used by 
dynamic simulations to include a complete geometric de¬ 
scription usable by a geometry processing module. Fur¬ 
thermore, impact, contact, and friction are typically han¬ 
dled by current systems in an ad hoc or rudimentary man¬ 
ner, if at all. In some cases, for instance, any possible 
impacts must be specified in advance; in others, a kind 
of “force field” technique is used, in which between every 
pair of objects there is a repelling force that is negligible 
except when objects are very close together. In addition, 
the desire to manipulate high-degree-of-freedom objects 
suggests that a module for specification of control algo¬ 
rithms should be a significant part of a dynamics system. 

Newton Architecture 

Using Newton , a designer can define complex three- 
dimensional physical objects and mechanisms and can 
represent object characteristics from various domains. 
An object consists of a number of “models,” each re¬ 
sponsible for organization of object characteristics from 
a particular domain. In most simulations the basic do¬ 
mains of geometry, dynamics, and controlled behavior 
are modeled. A dynamic modeling system, for example, 
is responsible for maintaining an object’s position, veloc¬ 
ity, and acceleration, and for automatically formulating 
the object’s dynamics equations of motion. A geometric 
modeling system is responsible for information about an 
object’s shape, distinguished features on the object, and 
computation of geometric integral properties such as vol¬ 
ume and moments of inertia. It also detects and analyzes 
object interpenetrations so that an interference modeling 
system can deal with collisions between objects. 

Newton has three main components: the definition and 
representation module, the analysis module and the re¬ 
port system. The definition module analyzes high level 
language descriptions of Newton entities and organizes 
the corresponding data structures. The analysis compo¬ 
nent implements the top-level control loop of simulations 
and coordinates the working of various analysis subsys¬ 
tems. The report system handles generation of graphical 
feedback to users during simulations as well as recording 
of relevant information for later regeneration of anima¬ 
tions. 
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Dynamic Analysis in Newton 

A physical object is modeled as a collection of rigid bodies 
related by constraints. Newton-Euler equations of mo¬ 
tion are associated with each individual rigid body. At 
the time an object is created the equations are of the form 

mr = 0 

Jlj x Ju: — 0, 

where m is the mass, f is the second time derivative of 
the position (i.e. the acceleration), J is the 3x3 iner¬ 
tia matrix, and u and w are the rotational velocity and 
acceleration, respectively. 

A specification that two objects are to be con¬ 
nected with a spherical hinge is met by the addi¬ 
tion of one vectorial constraint equation and the ad¬ 
dition of some terms to the motion equations of the 
constrained objects. For a holonomic constraint such 
as this one, the second derivative of the constraint 
equation can be used along with the modified mo¬ 
tion equations to solve for object accelerations and 
reaction forces. Thus, the equations above become 


mjri 

J\Cj\ *+■ w\ x 

m2t2 

J2W2 + tV2 X J2IV2 


Fhinge 
C\ X Fflingc 

— Fhtnge 
^2 X F /tinge 


T\ -\rd>i XCiH-U/1 X (u>i XCl) = r2+0>2 XC2-f t»/2 X (u >2 X C2), 


where c» is the vector from object f’s center of mass to 
the location of the hinge and Fhingc is the constraint force 
that keeps the objects together. Note that the last equa¬ 
tion above is the second time derivative of the holonomic 
constraint equation rj *f c\ = t 2 + C 2 for spherical joints. 
Other kinds of “hinges” commonly used in Newton in¬ 
clude revolute or pin joints, prismatic joints, springs and 
dampers, and rolling contacts. 

If gravity is present during the simulation the sys¬ 
tem will automatically add gravitational force terms to 
the objects’ translational motion equations. The system 
keeps track of the constraints responsible for the vari¬ 
ous terms in the motion equations. Thus, constraints, 
and their corresponding motion equation terms, can be 
removed at any time without necessitating complete red¬ 
erivation of the system of motion equations. 

Using this method of dynamics formulation, closed- 
loop kinematic chains are handled as simply as open 
chains. Though the formulation does lead to a large set 
of equations, the matrices are very sparse and often sym¬ 
metric. Thus, acceptable efficiency is achieved by the use 
of sparse matrix solution techniques. 


Event handling, impact and contact 

Newton , unlike many other simulation systems, can au¬ 
tomatically and incrementally reformulate the motion 
equations as exceptional events occur during simulations. 



Figure 2: Changing Kinematic Relationships 


One kind of exceptional event is a change in kinematic 
relationship between objects. Figure 2 shows a block that 
was initially sliding along a table top. After some time 
the edge of the table is reached and the contact rela¬ 
tionship changes from a plane-plane contact to a plane- 
edge contact. Still later the contact is broken altogether. 
These changing contact relationships are automatically 
detected by Newton. The system of motion equations 
and the related constraint equations are automatically 
maintained by Newton to reflect these changing relation¬ 
ships. 

Newton's event handler is primarily responsible for de¬ 
tection and resolution of impacts, for analysis of con¬ 
tinuous contacts between objects, for the corresponding 
maintenance of temporary hinges that model unilateral 
constraints between objects in contact, and for handling 
of events specified by control programs that necessitate 
changes in the constraint set. For example, the walking 
algorithm might tell the event handler to notify it when 
the biped’s foot touches the ground so that it can change 
the constraint equations. 

The geometric modeling subsystem is responsible for 
detecting and analyzing impacts and interpenetrations. 
In the usual method of handling impacts, the dynamic 
analysis module formulates impulse-momentum equa¬ 
tions in a manner completely analogous to the formu¬ 
lation of the basic dynamics equations, and solves these 
equations to produce the instantaneous velocity changes 
caused by the impact. The details of Newton's methods 
for handling impact, contact and other exceptional events 
are given elsewhere [16, 17, 11, 10], 


Event definition and control 

Support for control programming is provided by allow¬ 
ing users to define their own event types. Events provide 
the mechanism for state transitions in control programs. 
Event definition consists of a specification of how to de¬ 
tect the event (including information about how accu¬ 
rately the time of event occurrence should be isolated) 
and how to resolve it. 
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procedure position-with-PD( equation-name, object, 

x-dosired, delta-time ) 


var x t v, a: quantity 
r: real 

begin 

x = get-position-quantity( object ) 
v = get-velocity-quantity( object ) 
a a get-acceleration-quantity< object ) 

r * - delta-time / log( .01 ) 

add-named-equation( equation-name, 

" a-f~ir+“j(r — x-desired) = 0 " ) 

end 



Figure 3: PD Controller Used in Positioning 

Low-level Controllers 

In designing algorithms with Newton we found ourselves 
frequently using PD (proportional-derivative) controllers 
and curve-fitting controllers to control the “trajectory” of 
many of the defined quantities. In controlling the biped, 
for example, quintic interpolation was used to plot the 
trajectory of the heel, and a PD controller was used to 
orient the foot before it struck the ground. A small li¬ 
brary of these controllers is used in the biped algorithm, 
and will be described here. 

PD controllers are used in the biped algorithm to con¬ 
trol orientation, position and joint angle. Each controller 
adds an equation to the system of motion equations which 
defines the second derivative of the quantity in terms of 
the first derivative and the quantity itself. 

The procedure in Figure 3 adds an equation which pro¬ 
duces accelerations to move an object to within 1% of a 
position x-desired within a given time delta-time. The 
equation continues to affect the object’s motion until it is 
explicitly removed by the control algorithm. The quan¬ 
tities x, v and a are data structures representing state 
variables of the controlled object. These data structures 
are used by the add-named-equation function to create 
the appropriate equation. 

The Biped Algorithms 

We have developed two algorithms to control a biped: one 
for straight-line walking and one for walking up stairs. An 
abbreviated version of the walking algorithm is shown in 
Figure 8. 

The simulated biped consists of a torso, two legs with 
knee joints and two feet with toe joints. This model 
was adapted from a description of McMahon [21] and is 
shown in Figure 4. The hips are three degree of freedom 
spherical joints, the ankles are two degree of freedom uni¬ 
versal joints, while the knees and toes are one degree of 
freedom revolute joints, making a total of fourteen de- 


Figure 4: Simulated Biped Model 

grees of freedom. The biped is about 180 centimeters 
tall, weighs 85 kilograms, and has moments approximat¬ 
ing those of a human being. 

Walking Algorithm 

For ease of exposition, the walking algorithm of Figure 8 
is an abbreviated version of our actual algorithm. We 
have hidden many of the lower level procedures (in par¬ 
ticular, those which compute the trajectory of the heel). 
The actual algorithm is written is Lisp; a simulation lan¬ 
guage like that of Herr and Wyvill [15] will be imple¬ 
mented in the future. 

The algorithm has three states: START, SWING and 
DOUBLE-SUPPORT. Consider the START -> SWING transi¬ 
tion in Figure 8. After this transition (that is, during 
the SWING phase) the torso is forced to remain in a fixed 
orientation by the TORSO-ORIENTATION constraint. The 
siving foot follows a trajectory defined by an equation 
called SWING-HEEL-TRAJECTORY which was determined by 
the procedure move-heel-to-target, the stance leg is 
stiffened with set-angle-with-PD, the foot is oriented 
for landing with orient-vith-PD, and the angle of the 
toe is set with set-angle-with-PD. 

In the DOUBLE-SUPPORT phase, the constraints on the 
swing foot are removed, the names of the swing and 
stance legs are swapped, and the torso is constrained to 
accelerate slightly forward, which helps the trailing heel 
to lift. 

The largest number of constraints are applied in the 
SWING phase, during which eleven scalar equations have 
been added to Newton's system of motion equations. 
Since the biped has fourteen degrees of freedom, it re¬ 
mains underconstrained at all times. A quadratic cost 
function Q is defined in order to fully determine the mo¬ 
tion of the biped (a motion is chosen to minimize Q). 
The cost function is a weighted sum of the translational 
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Figure 5: Newton Statistical Output 


and angular accelerations, and of the difference between 
the torso translational acceleration and some accelera¬ 
tion defined by a function F which tries to keep the torso 
mid-way between the two feet. 

A slightly more complex walking algorithm was actu¬ 
ally implemented and tested with the Newton simul: tion 
system. Figure 6 shows ten frames in which the biped 
completes a full cycle. The full simulation consisted of 
twenty seconds of straight-line walking on a flat surface. 

Stair Climbing 

Another version of the algorithm was developed for stair 
climbing. The principal differences between the walk¬ 
ing and climbing algorithms were: a more complicated 
function to determine the trajectory (it has to avoid the 
steps), a “loose constraint” holding the torso upright, 
which allowed the torso to sway in a natural manner (this 
is explained below), and various parameter changes (for 
example, the foot strike orientation will be different when 
climbing stairs than when walking). 

Figure 7 shows six frames (side view and back view) 
in which the biped lifts the right foot. Note that the 
torso sways slightly (the degree of sway can be changed 
trivially) and that the torso moves from side to side to 
be over the supporting foot. 

Discussion 

The walking algorithm of Figure 8 looks almost too sim¬ 
ple to be true. While a lot of the underlying procedures 
have not been described in this paper, the real reason 
for this simplicity is that Newton automatically handles 
almost all of the underlying dynamics, and, if we choose, 
can also automatically handle the detection and resolu¬ 


tion of impact and contact. 3 

Due to the simplicity of our current biped model, the 
algorithms are forced to use too many constraints to 
achieve the desired motion. In particular, the trajec¬ 
tory of the heel must be exactly specified, yielding mo¬ 
tion which can sometimes appear unnaturally stiff. Ex¬ 
periments have shown that the best way to avoid this 
stiffness is to “loosely constrain” the heel trajectory by 
adding a weighted term to the minimization function Q. 
This weighted term is the square of the difference between 
the actual toe acceleration and a computed acceleration 
which guides the toe along the desired trajectory. 


Future Work 

We will experiment with elastic tendons in the hope that 
the swing phase will not have to specify an explicit tra¬ 
jectory for the heel. Instead, no torque would be applied 
in the swing leg; it would be pulled forward by the stored 
energy of the stretched tendons. This might approximate 
“ballistic walking” as described by McMahon[21]. 

The algorithms will be extended to include downstairs 
walking and turning on a level surface. Once a suite of 
such algorithms has been developed, we will be able to de¬ 
fine a set of high level commands such as “walk forward” 
and “step up”. With these commands, the animation of 
walking bipeds should be a simple task for the animator. 


Summary 

We have presented an algorithmic approach to control. 
This approach allows the animator to choose intuitive 
degrees of freedom by which to control an object. The 
control algorithm adds and removes constraint equations 
“on the fly” as the world state changes; a priori knowl¬ 
edge of the exact moment of each state change is not 
required. 

With the algorithmic approach, all consideration of 
dynamics and impact is left to the Newton simulation 
system. The burden on the animator is further re¬ 
duced by allowing underdetermined specification of mo¬ 
tion through the use of constrained optimization tech¬ 
niques. 

We have presented an algorithm to control a simulated 
biped, along with results from its execution on the New¬ 
ton simulation system. The algorithm has the advantage 
of being intuitive, simple to program, and reusable. 


3 For the sake of efficiency, two additional finite state ma¬ 
chines — one for each foot — are used to deal with impact 
and contact, rather than allowing Newton to do so in a more 
general, and hence more expensive, manner. These finite state 
machines are hidden from the animator. 
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HRLKIMG - right foot leads off _ 

Figure 6: Walking Cycle 



Figure 7: Climbing Cycle 
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const time-in-air = 0.5 s 

foot-strike-orientation = 10° about (001) 
torso-orientation = —10° about (0 0 1) 

^ B Kp (f*torjo “ 2^* foot "h^V/oof)) H" A w (rtorjo — H"^r/oot)) 

l«t Q = (52 w 2 + 52 + 20 (fcorso —F) 2 ) 

initial-states ° { START } 


transition START -> SWING when TRUE 
begin 

add-quadratic( Q ) 

orient-with-PD( TORSO-ORIENTATION, TORSO, torso-orientation, .2s) 

move-heel-to-target( SWING-HEEL-TRAJECTORY, swing-heel ) 

set-angle-with-PD( STANCE-KNEE-ANGLE, stance-knee, 175°, 0.1 s ) 

orient-with-PD( SWING-FOOT-ORIENTATION, swing-foot, foot-strike-orientation, time-in-air ) 

set-angle-with-PD( SWING-TOE-ANGLE, swing-toe, 0°, time-in-air ) 

end 

transition SWING -> DOUBLE-SUPPORT when hits-ground( swing-foot ) 
begin 

remove-equations( SWING-HEEL-TRAJECTORY, SWING-F00T-0RIENTATION, SWING-TOE-ANGLE ) 
swap-swing-and-stance () 
accelerate-torso( TORSO-ACCELERATION ) 
end 

transition DOUBLE-SUPPORT -> SWING when leaves-ground( swing-foot ) 
begin 

remove-equation( TORSO-ACCELERATION ) 
remove-equation( STANCE-KNEE-ANGLE ) 

move-heel-to-target( SWING-HEEL-TRAJECTORY, swing-heel ) 

set-angle-with-PD( STANCE-KNEE-ANGLE, stance-knee, 175°, 0.1 s ) 

orient-with-PD( SWING-FOOT-ORIENTATION, swing-foot, foot-strike-orientation, time-in-air ) 

set-angle-with-PD( SWING-TOE-ANGLE, swing-toe, 0 W , time-in-air ) 

end 


Figure 8: Abbreviated Walking Algorithm 
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Abstract 

Help is a combination of editor, window system, 
shell, and user interface that provides a novel environment for 
the construction of textual applications such as browsers, 
debuggers, mailers, and so on. It combines an extremely lean 
user interface with some automatic heuristics and defaults to 
achieve significant effects with minimal mouse and keyboard 
activity. The user interface is driven by a file-oriented pro¬ 
gramming interface that may be controlled from programs or 
even shell scripts. By taking care of user interface issues in a 
central utility, help simplifies the job of programming appli¬ 
cations that make use of a bitmap display and mouse. 

Keywords: Windows, User Interfaces, Minimalism 

Background 

Ten years ago, the best generally available interface to 
a computer was a 24x80 character terminal with cursor 
addressing. In its place today is a machine with a high- 
resolution screen, a mouse, and a multi-window graphical 
user interface. That interface is essentially the same whether 
it is running on a PC or a high-end 3D graphics workstation. 
It is also almost exactly the same as what was available on the 
earliest bitmap graphics displays. 

The decade that moved menus and windows from the 
research lab to more than ten million PC's, that changed com¬ 
puter graphics from an esoteric specialty to a commonplace, 
has barely advanced the state of the art in user interfaces. A 
case can be made that the state of the art is even backsliding: 
the hardware and software resources required to support an X 
terminal are embarrassing, yet the text editor of choice in uni¬ 
versities on such terminals continues to be a character-based 
editor such as vi or emacs, both holdovers from the I970’s. 
With the exception of the Macintosh, whose users have found 
many creative ways to avoid being restrained (or insulted) by 
the decision that they would find more than one mouse button 
confusing, the new generation of machines has not freed its 


This is a revision of a paper by the same lillc published in the Proceedings of 
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users from the keyboard-heavy user interfaces that preceded 
them. 

There are many reasons for this failure — one that is 
often overlooked is how uncomfortable most commercially 
made mice are to use — but the most important might well be 
that the interfaces the machines offer are just not very good. 
Spottily integrated and weighed down by layers of software 
that provide features too numerous to catalog and too special¬ 
ized to be helpful, a modem window system expends its 
energy trying *° look good, cither on a brochure or on a dis¬ 
play. What matters much more to a user interface is that it 
feel good. It should be dynamic and responsive, efficient and 
invisible [Pike88]; instead, a session with X windows some¬ 
times feels like a telephone conversation by satellite. 

Where will we be ten years from now? CRT’s will be 
a thing of the past, multimedia will no longer be a buzzword, 
pen-based and voice input will be everywhere, and university 
students will still be editing with emacs. Pens and touch¬ 
screens are too low-bandwidth for real interaction; voice will 
probably also turn out to be inadequate. (Anyway, who 
would want to work in an environment surrounded by people 
talking to their computers?) Mice are sure to be with us a 
while longer, so we should learn how to use them well. 

With these churlish thoughts in mind, I began a couple 
of years ago to build a system, called help, that would have 
as efficient and seamless a user interface as possible. 1 delib¬ 
erately cast aside all my old models of how interfaces should 
work: the goal was to leam if I could do better. I also erased 
the usual divisions between components: rather than building 
an application or an editor or a window system, I wanted 
something that centralized a very good user interface and 
made it uniformly available to all the components of a sys¬ 
tem. 

Introduction 

Help is an experimental program that combines 
aspects of window systems, shells, and editors to address 
these issues in the context of textual applications. It is 
designed to support software development, but falls short of 
being a true programming environment. It is not a ‘toolkit’; it 
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is a self-contained program, more like a shell than a library, 
that joins users and applications. From the perspective of the 
application (compiler, browser, etc.), it provides a universal 
communication mechanism, based on familiar Unix® file 
operations, that permits small applications — even shell pro¬ 
cedures — to exploit the graphical user interface of the sys¬ 
tem and communicate with each other. For the user, the inter¬ 
face is extremely spare, consisting only of text, scroll bars, 
one simple kind of window, and a unique function for each 
mouse button — no widgets, no icons, not even pop-up 
menus. Despite these limitations, help is an effective envi¬ 
ronment in which to work and, particularly, to program. 

The inspiration for help comes from Wirth’s and 
Gutknecht’s Oberon system [Wirt89, Reis91], Oberon is an 
attempt to extract the salient features of Xerox’s Cedar envi¬ 
ronment and implement them in a system of manageable size. 
It is based on a module language, also called Oberon, and 
integrates an operating system, editor, window system, and 
compiler into a uniform environment. Its user interface is dis¬ 
armingly simple: by using the mouse to point at text on the 
display, one indicates what subroutine in the system to exe¬ 
cute next. In a normal Unix shell, one types the name of a file 
to execute; instead in Oberon one selects with a particular 
button of the mouse a module and subroutine within that 
module, such as Edit.Open to open a file for editing. 
Almost the entire interface follows from this simple idea. 

The user interface of help is in turn an attempt to 
adapt the user interface of Oberon from its language-oriented 
structure on a single-process system to a file-oriented multi¬ 
process system. Plan 9 [Pike90]. That adaptation must not 
only remove from the user interface any specifics of the 
underlying language; it must provide a way to bind the text on 
the display to commands that can operate on it: Oberon passes 
a character pointer; help needs a more general method 
because the information must pass between processes. The 
method chosen uses the standard currency in Plan 9: files and 
file servers. 

The interface seen by the user 

This section explains the basics of the user interface; 
the following section uses this as the background to a major 
example that illustrates the design and gives a feeling for the 
system in action. 

Help operates only on text; at the moment it has no 
support for graphical output. A three-button mouse and key¬ 
board provide the interface to the system. The fundamental 
operations are to type text with the keyboard and to control 
the screen and execute commands with the mouse buttons. 
Text may be selected with the left and middle mouse buttons. 
The middle button selects text defining the action to be exe¬ 
cuted; the left selects the object of that action. The right but¬ 
ton controls the placement of windows. Note that typing does 
not execute commands; newline is just a character. 

Several interrelated rules were followed in the design 
of the interface. These rules are intended to make the system 
as efficient and comfortable as possible for its users. First. 


brevity: there should be no actions in the interface — button 
clicks or other gestures — that do not directly affect the sys¬ 
tem. Thus help is not a ‘click-to-type’ system because that 
click is wasted; there are no pop-up menus because the ges¬ 
ture required to make them appear is wasted; and so on. Sec¬ 
ond, no retyping: it should never be necessary or even worth¬ 
while to retype text that is already on the screen. (Many sys¬ 
tems allow the user to copy the text on the screen to the input 
stream, but for small pieces of text such as file names it often 
seems easier to retype the text than to use the mouse to pick it 
up, which indicates that the interface has failed.) As a corol¬ 
lary, when browsing or debugging, rather than just typing 
new text, it should be possible to work efficiently and com¬ 
fortably without using the keyboard at all. Third, automa¬ 
tion: let the machine fill in the details and make mundane 
decisions. For example, it should be good enough just to 
point at a file name, rather than to pass the mouse over the 
entire textual string. Finally, defaults: the most common use 
of a feature should be the default. Similarly, the smallest 
action should do the most useful thing. Complex actions 
should be required only rarely and when the task is unusually 
difficult. 

The help screen is tiled with windows of editable 
text, arranged in (usually) two side-by-side columns. Figure 
l shows a help screen in mid-session. Each window has 
two sub windows, a single tag line across the top and a body 
of text. The tag typically contains the name of the file whose 
text appears in the body. 

The text in each subwindow (tag or body) may be 
edited using a simple cut-and-paste editor integrated into the 
system. The left mouse button selects text; the selection is 
that text between the point where the button is pressed and 
where it is released. Each subwindow has its own selection. 
One subwindow — the one with the most recent selection or 
typed text — is the location of the current selection and its 
selection appears in reverse video. The selection in other sub- 
windows appears in outline. 

Typed text replaces the selection in the subwindow 
under the mouse. The right mouse button is used to rearrange 
windows. The user points at the tag of a window, presses the 
right button, drags the window to where it is desired, and 
releases the button. Help then does whatever local rear¬ 
rangement is necessary to drop the window to its new location 
(the rule of automation). This may involve covering up some 
windows or adjusting the position of the moved window or 
other windows. Help attempts to make at least the tag of a 
window fully visible; if this is impossible, it covers the win¬ 
dow completely. 

A tower of small black squares, one per window, 
adorns the left edge of each column. (See Figure 1.) These 
tabs represent the windows in the column, visible or invisible, 
in order from top to bottom of the column, and can be clicked 
with the left mouse button to make the corresponding window 
fully visible, from the tag to the bottom of the column it is in. 
A similar row' across the top of the columns allows the 
columns to expand horizontally. These little tabs are an ade- 
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I_headers halp/Boot_Exit 


MM 




/usr/rob/src/help/ Close! Get! | 

■r 

/help/edit/stf Put!| Close! G 

W dat.c 

M dat.h 

H erjrs.c 

1 exec.c 

Jfilc.c 

- 

Open /usr/rob/src/help 

Pattern ’’ 

Text , meraraove| , 

Cut Paste Snarf 

Write New 

/usr/rob/src/help/errs.c Close! Get! i 


/help/cbr/stf Close! Get! 

/usr/rob/src/help/file.c Close! Get! | 

U Open mk src decl uses *.c 

I 

/help/db/stf Close! Get! 

B qid = dp->qid; 

■ if(tab==dirtabO S.& dp->qid==(CHDIR 1 Qdirl)) 

H break; 

ps pc regs broke 

stack kstack nextkstack 

fl foeramovdC dir .name, dp->name, NAMELEN) ; 

B dir.ql1.path = (QPAGE(fp-> path) <<<JPAGESHI FT 

1 

/help/tnail/stf Close! Get! 

] headers nessages delete reread se| 

/inips/include/libc.h Close! Get! | 


From nick@cs.bbk.ac.uk Puts| C 

extern void* menccpy(void*, void*, int, ulong); 

extern void* menset(void*, int, ulong); 

extern int memcnp(void*, void*, ulong); 

extern void* tnencpy(void*, void*, ulong); 

From research.att.conics.bbk.ac.u 
k!localhostlcs.bbk.ac.ukSnick Fri 
Apr 12 14:48:23 EOT 1991 

Subject: UNIX in song & verse 

I Rob, 

I The UKUUG are collecting old-tine 
fi verses about UNIX before they 

R disappear from the ninds of thosej 


extern void* morachr(v>ia*V int, ulong); 

/♦ 

* string routines 
*/ 


Figure l: A small help screen showing two columns of windows. The current selection is the black line in the bottom left window. 
The director)' /usr/rob/src/help has been Opened and, from there, the source files /usr/rob/src/help/errs. c and 
file ,c. 


quate but not especially successful solution to the problem of 
managing many overlapping windows. The problem needs 
more work; perhaps the file name of each window should pop 
up alongside the tabs when the mouse is nearby. 

Like the left mouse button, the middle button also 
selects text, but the act of releasing the button does not leave 
the text selected; rather it executes the command indicated by 
that text. For example, to cut some text from the screen, one 
selects the text with the left button, then selects with the mid¬ 
dle button the word Cut anywhere it appears on the display. 
(By convention, capitalized commands represent built-in 
functions.) As in any cut-and-paste editor, the cut text is 
remembered in a buffer and may be pasted into the text else¬ 
where. If the text of the command name is not on the display, 
one just types it and then executes it by selecting with the 
middle button. Note that Cut is not a ‘button’ in the usual 
window system sense: it is just a word, wherever it appears, 
that is bound to some action. To make things easier, help 
interprets a middle mouse button click (not double click) any¬ 
where in a word as a selection of the whole word (the rule of 
defaults). Thus one may just select the text normally, then 
click on Cut with the middle button, involving less mouse 
activity than with a typical pop-up menu. If the text for selec¬ 
tion or execution is the null string, help invokes automatic 
actions to expand it to a file name or similar context- 
dependent block of text. If the selection is non-null, it is 
always taken literally. 

As an extra acceleration, help has two commands 
invoked by chorded mouse buttons. While the left button is 
still held down after a selection, clicking the middle button 
executes Cut; clicking the right button executes Paste, 


replacing the selected text by the contents of the cut buffer. 
These are the most common editing commands and it is con¬ 
venient not to move the mouse to execute them (the rules of 
brevity and defaults). One may even click the middle and 
then right buttons, while holding the left down, to execute a 
cut-and-paste, that is, to remember the text in the cut buffer 
for later pasting. 

More than one word may be selected for execution; 
executing Open /usr/rob/lib/profile creates a new 
window and puts the contents of the file in it. (If the file is 
already open, the command just guarantees that its window is 
visible.) Again, by the rule of automation, the new window’s 
location will be chosen by help. The hope is to do some¬ 
thing sensible with a minimum of fuss rather than just the 
right thing with user intervention. This policy was a deliber¬ 
ate and distinct break with most previous systems. (It is pre¬ 
sent in Oberon and in most tiling window systems but help 
takes it farther.) This is a contentious point, but help is an 
experimental system. One indication that the policy is sound 
is that minor changes to the heuristics often result in dramatic 
improvements to the feel of the system as a whole. With a lit¬ 
tle more work, it should be possible to build a system that 
feels just right. 

A typical shell window in a traditional window system 
permits text to be copied from the typescript and presented as 
input to the shell to achieve some sort of history function: the 
ability to re-execute a previous command. Help instead tries 
to predict the future: to get to the screen commands and text 
that will be useful later. Every piece of text on the screen is a 
potential command or argument for a command. Many of the 
basic commands pull text to the screen from the file system 
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Ihelp/BootExit 

MS 


/usr/rob/lib/profile Close! 
bind -c Shone/tnp /tup 
bind -a Shome/bin/rc /bin 
bind -a Shoane/bin/Scputype /bin 


fn * l if{! - S(* 0) $♦ I 
fn g* t if(1 ~ $1* 0) $» i 


helplM)ot=(edit cbr db mail)| 


Get! 


SI 


/help/edit/stf Put Ij Close! 
Open /usr/rob/lib/profile| 

Pattern 1 ’ 

Text 1 ’ 

Paste Snarf 
tfri& Now 


Get! 


switch($service)£ 
case terminal 

bind ’(k’ /net/dk 
prompt=(’g$ ’ 1 ») 
sitc=plan9 

exec 8.5 -i 8.5start l/lib/font/bit/peln/9.0 

case cpu 

bind ’ilk’ /net/dk ( needed on a nonet terminal 

bind -b /nnt/term/mnt/8.5 /dev 

news 

fortune 


Figure 2: Executing Cut by sweeping the word while holding down the middle mouse button. The text being selected for execution is 
underlined. 


rr 

help/Boot Exit 




Mm 





1 





■ 

/usr/rob/lib/profile Put! (Close! Get! 

■ 


/help/edit/stf Put!| Close! Get! 


■ 

/usr/rob/src/help/help,c Close! Get! j 

■ 


Open /usr/rob/src/help/help. cj 

Pattern •* 

Text ’’ 

Cut Paste Snarf 

Write New 


■ 

■ 

•include <u.h> 
tinclude <libc.h> 

•include <libg.h> 

(include <libframe.h> 

(include "c|at.h" 

(include "fns.h" 





int nouseslave; 

int kbdslave; 






/usr/rob/src/help/dat. h Close! Get! | 






jtypedef struct Addr Addr; 

typedef struct Client Client; 

typedef struct Page Page; 

typedef struct Proc Proc; 

typedef struct String String; 

typedef struct Text Text; 





Figure 3: Opening files. After typing the full path name of help. c, the selection is automatically the null string at the end of the file 
name, so just click Open to open that file; the defaults grab the whole name. Next, after pointing into dat .h, Open will gel 
/usr/rob/src/help/dat,h. 


with a minimum of fuss. For example, if Open is executed 
without an argument, it uses the file name containing the most 
recent selection (the rule of defaults). Thus one may just 
point with the left button at a file name and then with the mid¬ 
dle button at Open to edit a new file. Using all four of the 
rules above, if Open is applied to a null selection in a file 
name that does not begin with a slash (/), the directory name 
is extracted from the file name in the tag of the window and 
prepended to the selected file name. An elegant use of this is 
in the handling of directories. When a director)' is Opened, 
help puts the its name, including a final slash, in the tag and 
just lists the contents in the body. (See Figure 1.) 

For example, by pointing al dat. h in the source file 
/usr/rob/src/help/help. c and executing Open, a 
new window is created containing the contents of 
/usr/rob/src/help/dat ,h: two button clicks. (See 


Figure 3.) Making any non-null selection disables all such 
automatic actions: the resulting text is then exactly what is 
selected. 

That Open prepends the directory name gives each 
window a context: the directory in which the file resides. The 
various commands, built-in and external, that operate on files 
derive the director)' in which to execute from the tag line of 
the window. Help has no explicit notion of current working 
directory; each command operates in the directory appropriate 
to its operands. 

The Open command has a further nuance: if the file 
name is suffixed by a colon and an integer, for example 
help. c: 27, the window will be positioned so the indicated 
line is visible and selected. This feature is reminiscent of 
Robert Henry’s error(l) program in Berkeley Unix, 
although it is integrated more deeply and uniformly. Also, 
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unlike error, help’s syntax permits specifying general 
locations, although only line numbers will be used in this 
paper. 

It is possible to execute any external Plan 9 command. 
If a command is not a built-in like Open, it is assumed to be 
an executable file and the arguments are passed to the com¬ 
mand to be executed. For example, if one selects with the 
middle button the text 

grep '‘main' /sys/src/cmd/help/*.c 

the traditional command will be executed. Again, some 
default rules come into play. If the tag line of the window 


containing the command has a file name and the command 
does not begin with a slash, the directory of the file will be 
prepended to the command. If that command cannot be found 
locally, it will be searched for in the standard directory of 
program binaries. The standard input of the commands is 
connected to en empty file; the standard and error outputs are 
directed to a special window, called Errors, that will be 
created automatically if needed. The Errors window is also 
the destination of any messages printed by the built-in com¬ 
mands. 

The interplay and consequences of these rules are eas¬ 
ily seen by watching the system in action. 


An example 

In this example I will go through the process of fixing a bug reported to me in a mail message sent by a user. Please par¬ 
don the informal first person for a while; it makes the telling easier. 

When help starts it loads a set of ‘tools', a term borrowed from Oberon, into the right hand column of its initially two- 
column screen. These are files with names like /help/edit/stf (the stuff that the help editor provides), 
/help/mail/stf, and so on. Each is a plain text file that lists the names of the commands available as parts of the tool, col¬ 
lected in the appropriate directory. A help window on such a file behaves much like a menu, but is really just a window on a 
plain file. The useful properties stem from the interpretation of the file applied by the rules of help; they are not inherent to the 
file. 


ar 


help/Boot 


Exit 



/help/edit/stf 

Close! 

Get! 


Open 

Pattern •» 

Text ’’ 

Cut Paste 

Write New 

Snarf 



/help/cbr/stf 

Close! 

Get! 

[f] Open mk src decl uses *.c | 


/help/db/stf 

Close! 

Get! 1 

[ 

ps pc 

stack kstack 

regs broke 

nextkstack 


/help/mail/stf 

Close! 

Get! 


headers messages delete 

reread send 


\ 








Figure 4: The screen after booting. 


To read my mail, I first execute headers in the mail tool, that is, 1 click the middle mouse button on the word headers in the 
window containing the file /help/mail/stf . This executes the program /help/mail/headers by prefixing the direc¬ 
tory name of the file /help/mail/stf, collected from the tag, to the executed word, headers. This simple mechanism 
makes it easy to manage a collection of programs in a directory. 
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r 

/help/edit/stf 

Close! 

Get! 


Open 

Pattern * * 

Text *’ 

Cut Paste 

Write New 

Snarf 


r 

/help/cbr/stf 

Close! 

Get! 

□ Open ink src decl uses *.c | 

r 

/help/db/stf 

Close! 

Get! | 

i 

ps pc 

stack kstack 

regs broke 

nextkstack 

r 

/help/mall/stf 

Close! 

Get! 


headers messages delete 

1 s 

reread send 

r 

/nail/box/rob/nbox 

/bin/help/nail/| 




2 sean Tue Apr 16 19:26 EOT 

3 attunixirrg Tue Apr 16 19:03 EOT 1991 

4 knight%MRCO.CARLETON.CAGnitvna.nit.ed 
u Tue Apr 16 19:01 EOT 

5 deutschfcPARCPLACE. COHSmitvma.nit. edu 
Tue Apr 16 18:54 EOT 

6 howard Tue Apr 16 15:02 EOT 

7 deutsch*PARCPLACE.CON&nitvwa.nit.edu 
Tue Apr 16 12:52 EDT 


headers! help/Boot 


Exit 


Figure 5: After executing mail/headers. 

Headers creates a new window containing the headers of my mail messages, and labels it /mail/box/rob/mbox. I 
know Sean has sent me mail, so I point at the header of his mail (just pointing with the left button anywhere in the header line 
will do) and click on messages. 


1 

— 

headers help/Boot Exit 




m 







- i 



■r 


Fron sean Close! | 



/help/edit/stf Close! Get! 




From sean Tue Apr 16 19:26:14 EDT 1991 

1 tried your new help and got this: 

help 176153: user TLB aiss (load or fetch) badvaddr=0x0 
help 176153: statussOxfbOc pc=0xl8df4 sp=0x3f4e8 



Open 

Pattern 5 ’ 

Text ’ 1 

Cut Paste Snarf 

Write New 




1 



/help/cbr/stf Close! Get! 






fj Open nk src decl uses *.c | 







/help/db/stf Close! Get! 







ps pc regs broke 

stack kstack nextkstack 







/help/nail/stf Close! Get! 






■ 

headers messages delete reread send 

1 N 







/mail/box/rob/nbox /bin/help/nail/| 






1 

1 chkOallas.con Tue Apr 16 19:30 EDT 

2 sjean Tue Apr 16 19:26 EDT 

3 attunixJrrg Tue Apr 16 19:03 EDT 1991 

4 knight%MRCQ.CARLETON.CAfcmitvma.wit.ed 
u Tue Apr 16 19:01 EDT 

5 deutschfcPARCPLACE.COWSnltvna.nit.edu 
Tue Apr 16 18:54 EDT 

6 howard Tue Apr 16 15:02 EDT 

7 deutschtPARCPLACE.COKSnitvna.nit.edu 
Tue Apr 16 12:52 EDT 


Figure 6: After applying messages to the header line of Sean's mail. 

A new version of help has crashed and a broken process lies about waiting to be examined. (This is a property of Plan 9. not of 
help.) I point at the process number (I certainly shouldn’t have to type it) and execute stack, in the debugger tool. 
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/help/db/stf. This pops up a window containing the traceback as reported by adb, a primitive debugger, under the aus¬ 
pices of /help/db/stack. 


nr 


headers help/Boot 


Exit 


From sean 


Close! T 


Fron sean Tue Apr 16 19:26:14 EOT 1991 
1 tried your new help and got this: 

help 176153: user TLB niss (load or fetch) badvaddrmOxO 
help 1761J53: statusmOxfbOc pc=0x!8df4 sp=0x3f4e8 


/usr/rob/src/help/ 


176153 stack CloseTf 


I 


last exception: TLB miss (load or fetch) 
/sys/src/libc/mips/strchr.s:34 strchr+»68? MOVW 0(R3),R5 
strchr(c=«3c, s=IO) called from strlen+llc /sys/src/libc/port 
/strlen.c:7 

strlen(s=#0) called from textinsert+130 text.c:32 
textinsert(sel=#l, t=H40e60, s=HO, qO=#d, fullmfll) called fro 
m errs+leS errs.c:34 
n = #3d7cc 

errs(s=#0) called fron Xdie2+til4 exec.c:252 
p = f40d88 

Xdie2() called fron lookup+tfc4 exec.c:101 
lookup(s=M40be8) called from execute+*50 exec.c:207 
i = Hf 
n = MicSbf 

execute(t=tf3ebbc, p0=!2, pl=M2) called fron control<M430 Ctrl 
.c:331 

control() called from control ctrl.c:320 
t s I3ebbc 
op = *0 


#0 

p : 10 
dclick : 
pO = *2 
obut = #0 


«0 


P Open 


Q 


/help/edit/stf Close! Get? 


Open 

Pattern ’* 
Text »* 

Cut Paste 

Write New 


Snarf 


/help/cbr/stf Close! Get! 


nk src decl uses *.c 


/help/db/stf Close! Get! 


ps pc regs broke 

st a^k kstack nextkstack 


/help/raail/stf Close! Get! 


headers messages delete reread send 


/nail/box/rob/mbox _ /bln/help/mail/j 

1 chkiSalias.com Tue Apr 16 19:30 EOT 

2 sjean Tue Apr 16 19:26 EOT 

3 attunixirrg Tue Apr 16 19:03 EOT 1991 

4 knightttMRCO.CARLETON.CA®mitvma .mit. ed 
u Tue Apr 16 19:01 EOT 

5 deutschttPARCPLACE.C0H@nitvna.mit.edu 
Tue Apr 16 18:54 EOT 

6 howard Tue Apr 16 15:02 EOT 

7 deutschttPARCPLACE.COM^mitvma.rait.edu 
Tue Apr 16 12:52 EOT 


Figure 7: After applying db/stack to the broken process. 

Notice that this new window has many file names in it. These are extracted from the symbol table of the broken program. I can 
look at the line (of assembly language) that died by pointing at the entry /sys/src/libc/mips/strchr. s :34 and exe¬ 
cuting Open, but I’m sure the problem lies further up the call stack. The deepest routine in help is textinsert, which calls 
strlen on line 32 of the file text. c. 1 point at the identifying text in the stack window and execute Open to see the source. 


| headers help/Boot Exit | 




1 

Fron sean Close! | 

- 


/help/edit/stf Close! Get! 

" From sean Tue Apr 16 19:26:14 EDT 1991 
i tried your now help and got this: 

help 176153: user TLB niss (load or fetch) badvaddrmOxO 
help 176153: status=0xfb0c pc=0xl8df4 sp=0x3f4e8 


Open 

Pattern * 1 

Text ’' 

Cut Paste Snarf 

Write New 


/usr/rob/src/help/ 176153 stack Close!) 



last exception: TLB niss (load or fetch) 
/sys/src/libc/mips/strchr.s:34 strchr+H68: MOVW 0(R3), R5 
strchr(c=»3c, s=#0) called from strlen-Mlc /sys/src/libc/port 
/strlen.c:7 

strlen(smftO) called from textinsert+#30 tjext.c:32 
textinsert(sel=til, t=#40e60, s=#0, qO=IO, full=ll) called fro 
m errs+#e8 errs.c:34 
n = #3d7cc 

errs(s=#0) called from Xdie2+4I14 exec.c:252 
p = I40d88 

Xdie2() called from lookpu+»c4 exec.c:101 

£ 


Open nk src decl uses *.c 


/help/db/stf Close! Get! 

[ 

ps pc regs broke 

stack kstack nextkstack 



1 

headers messages delete reread send 

| 

/usr/rob/src/help/text.c Close! Get! j 


/nail/box/rob/mbox /bin/help/mail/] 

if(sel) 


1 chkiSalias.com Tue Apr 16 19:30 EDT 

2 sjean Tue Apr 16 19:26 EDT 

3 attunixirrg Tue Apr 16 19:03 EDT 1991 





1 

pO = qO-t->org; 
if(pO < 0) 

t->org ♦= n; 

else if(pO <= t->nchars) 

frinsert(t, as, pO); 
t->qO = qO; 
t if('full) 


1 

4 knightttHRCO.CARLETON.CAGmitmva.nit.ed 
u Tue Apr 16 19:01 EDT 

5 deutschttPARCPLACE.COMSnitvna.mit.edu 
Tue Apr 16 18:54 EDT 

6 howard Tue Apr 16 15:02 EDT 

7 deutschttPARCPLACE. COK«Smitvma.nit.edu 
Tue Apr 16 12:52 EDT 


Figure 8: After Opening text. c at line 32. 


Graphics Interface ’92 
























The problem is coming to light: s, the argument to strlen, is zero, and was passed as an argument to textinsert by the 
routine errs, which apparently also got it as an argument from Xdie2. I close the window on text .c by hitting Close 1 in 
the tag of the window. By convention, commands ending in an exclamation mark take no arguments; they are window opera¬ 
tions that apply to the window in which they are executed. Next I examine the source of the suspiciously named Xdie2 by 
pointing at the stack trace and Opening again. (See Figure 9.) 



headers help/Boot Exit 


l 




From seen Close! | _ 


Fro n sean Tue Apr 16 19:26:14 EOT 1991 
i tried your new help and got this: 

help 176153: user TLB miss (load or fetch) badvaddrsOxO 
help 1761J53: statussOxfbQc pc=0xl8df4 sps0x3f4e8 


/usr/rob/src/help/ 176153 stack Close! 


textinsert(sel=«l, t=«40e60, s=#0, qO=«d, full=#l) called fro 
m errsMeS errs.c:34 
n = H3d7cc 

I errs(s=#0) called fron Xdie2+»14 exec.|c:252 
p = H40d88 

Ixdie2() called froa lookup*#c4 exec.c:101 
lookup(s=»40be8) called from execute-ftso exec.c:207 
i = if 
n = ilcSbf 

execute(t=#3ebbc, pO=M2, pl=«2) called from control**430 Ctrl 
, .c:331 


void 

Xdie2(int argc, char *argv[], Page ♦page, Text ♦curt) 


errs( (iichar* )n): 


/help/edit/stf Close 




/usr/rob/src/help/exec.c Close! Get 


/♦ 

♦ Exact match 
♦/ 

Page* 

findopenl(Page *p, char *nane) 


Pattern ’’ 

Text * * 

Cut Paste Snarf 

Write New 


/help/cbr/stf Close! Get 


Open mk src decl uses *.c 


/help/db/stf Close! Get! _ 


regs broke 
stack nextkstack 


/help/mail/stf Close! Get! 


headers messages delete reread send 


/mail/box/rob/tnbox /bin/belp/mail/| 


ll chkOalias.com Tue Apr 16 19:30 EOT 

2 sjean Tue Apr 16 19:26 EOT 

3 attunixirrg Tue Apr 16 19:03 EDT 1991 

4 knightftMRCG.CARLETON.CAOmitvna.mit.ed 
u Tue Apr 16 19:01 EDT 

5 deutschfcPARCPLACE.COMOnitvma.mit.edu 
Tue Apr 16 18:54 EDT 

6 Howard Tue Apr 16 15:02 EDT 

7 deutschtPARCPLACE.COHOmitvna.mit.edu 
Tue Apr 16 12:52 EDT 


Figure 9: After Opening exec. c at line 252. 

Now the problem gets harder. The argument passed to errs is a variable, n, that appears to be global. Who set it to 
zero? I can look at all the uses of the variable in the program by pointing at the variable in the source text and executing 
uses * . c by sweeping both ‘words’ with the middle button in the C browser tool, /help/cbr/stf. Uses creates a new 
window with all references to the variable n in the files /usr/rob/src/help/*. c indicated by file name and line number. 
The implementation of the C browser is described below; in a nutshell, it parses the C source to interpret the symbols dynami¬ 
cally. If instead l had run the regular Unix command 

grep n /usr/rob/src/help/*.c 

I would have had to wade through every occurrence of the letter n in the program. 
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rr 

headers help/Boot Exit 





■ 





■ 




■ 

From sean Close! | 

■ 


/help/edit/stf Close! 

Get! 



“ From sean Tue Apr 16 19:26:14 EOT 1991 
i tried your new help and got this: 

help 176153: user TLB miss (load or fetch) badvaddr=0x0 
help 1761j53: statusrOxfbOc pc=0xl8df4 sp=0x3f4eB 

r 

E 

i 

Open 

Pattern ’’ 

Text * ’ 

Cut Paste Snarf 

Write Net* 


r 

/usr/rob/src/help/ 176153 stack Close!j 

r 




= 


/help/cbr/stf Close! 

Get! 



B textinsert(sel=ttl, t=H0e6O, s=IO, q0=#d, fullsil) called fro 
hid errs+#e8 orrs.c:34 

■ 





[] Open mk src decl uses *.^ r 



| n = H3d7cc 


■ 


Get!'* | 



/usr/rob/src/help/exec.c Close! Get! | 


1 


wamm 



void 


II 




Xdie2(int argc, char *argv[], Page *page, Text ♦curt) 

errs((uchar*)jij); 



/help/mail/stf Close) 

Get! 




[ 

headers messages delete 

reread send 





/usr/rob/src/help/J 

Close! 



/♦ 

♦ Exact match 
♦/ 

Page* 

findopenl(Page *p, char *name) 

char *s; 
int n; 

Page *q; 



./dat.h:136 
exec.c;213 
exec.c:252 
help.c:35 




Again: 

if(p == 0) 

return p; 






Figure 10: After finding all uses of n. 

The first use is clearly the declaration in the header file. It looks like help, c: 35 should be an initialization. I Open help. c 
to that line and see that the variable is indeed initialized. (See Figure 11; a few lines off the top of the window on help. c is the 
opening declaration of main ().) Some other use of n must have cleared it. Line 252 of exec, c is the call; I know that’s a 
read, not a write, of the variable. So I point to exec. c: 213 and execute Open. 


| headers help/Boot Exit | 




m 

From sean Close! | 

i 


/help/edit/stf Close! 

Get! 

L 

L 

~ From sean Tue Apr 16 19:26:14 EDT 1991 
i tried your new help and got this: 

help 176153: user TLB miss (load or fetch) badvaddrsOxG 
help 1761j53: statussOxfbOc pc=0xlBdf4 sp=0x3f4e8 

: 


Open 

Pa^ern ’ * 

Text ’ ’ 

Cut Paste Snarf 

Write New 

| 

/usr/rob/src/help/ 1761S3 stack Close !| 


■ 


B textinsert(sel=ll, t=*40eG0, srio, q 0 =#d, fullsil) called fro 
■ n errs*tteB errs.c:34 



/help/cbr/stf Close! 

Get! 

1 

1 

□ Open nk src decl uses *.c 

1 

| n = I3d7cc 

| 


/help/db/stf Close! 

Get! 


/usr/rob/src/help/exec.c Close! Get! | 

I 

1 

ps pc regs 

broke 

■ 

■ void 


1 

atach KscacK nexcxstacK j 

1 

|xdiel(int argc, char *argv[], Page *page, Text *curt) 



/help/mail/stf Close! 

Get! 



m 

headers messages delete 

reread send 

■ 

j n = 0; | 



■ 


■ 




■ 




/usr/rob/src/help/l 

Close! 

1 

/usr/rob/src/help/help.c Close! Get! j 


p 

./dat.h:13G 
e^ec.c:213 
exec.c:252 
help.c:35 



I Dir d; 

^ Rectangle r; 





1 n s “a test strinq"; 1 





if (accessC'/pint/help/new" , 0) sr 0)| 

fprint(2, "help: already running\n"); 
exits("running"); 

fn = 0; 

ARGBEGINf 
case ’f’: 











Figure 11: The writing of non line exec, c: 213. 

Here is the jackpot of this contrived example. Sometime before Xdie2 was executed, Xdiel cleared n. I use Cut to remove 
the offending line, write the file back out (the word Put! appears in the tag of a modified window) and then execute mk in 
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/help/cbr to compile the program (a total of three clicks of the middle button). I could now answer Sean’s mail to tell him 
that the bug is fixed. I’ll stop now, though, because to answer his mail I’d have to type something. Through this entire demo I 
haven’t yet touched the keyboard. 


| headers help/Boot Exit 1 


Hi 



L 


From sean Close! | 



/help/edit/stf Close! 

Get! | 

L 

i 


Frora sean Tue Apr 16 19:26:14 EDT 1991 
i tried your new help and got this: 

help 176153: user TLB niss (load or fetch) badvaddr=OxO 
help 1761|53: status=OxfbOc pc=0xl8df4 spr0x3f4e8 

- 


Open 

Pattern 15 

Text ’» 

Cut Paste Snarf 

Write New 

i 


/usr/rob/src/help/ 176153 stack Close !| 


T 





/help/cbr/stf Close! 

Get! I 


SB 

textinsert(sel=fll, t=!40e60, s=l!0, q0=4d, fullsftl) called fro 
m errs+ffeB errs.c:34 



1 

E 


[] Open nk arc decl uses «.c | 

1 


n = I3d7cc 



/help/dSystf Close! 

Get! 



/usr/rob/src/help/exec.cj Close! Get! 


1 

ps pc regs broke 


■ 

void 


li 



1 

r 

Xdiel(int argc, char *argv[], Page *page, Text *curt) 



/help/mail/stf Close! 

Get! 



I 

II 


I 

headers messages delete 

reread send 

■ 





/usr/rob/src/help/| 

Close! 



/usr/rob/src/help/nk (Close! 



./dat.h:136 




vc -w exec.c 

vl help.v clik.v ctrl.v dat.v errs.v exec.v file.v page.v pic 
k.v proc.v scrl.v text.v util.v xtrn.v /nips/lib/libfrane.a - 
lg -lregexp -ldnalloc 



exec.c:213 
exec.c:252 
help.c:35 










Figure 12: After the program is compiled. 

This demonstration illustrates several things besides 
the general flavor of help. Most important, by following 
some simple rules it is possible to build an extremely efficient 
and productive user interface using just a mouse and screen. 
This is illustrated by how help makes it easy to work with 
files and commands in multiple directories. The rules by 
which help constructs file names from context and by which 
the utilities derive the context in which they execute simplify 
the management of programs and other systems constructed 
from scattered components. Also, the few common rules 
about text and file names allow a variety of applications to 
interact through a single user interface. For example, none of 
the tool programs has any code to interact directly with the 
keyboard or mouse. Instead help passes to an application 
the file and character offset of the mouse position. Using the 
interface described in the next section, the application can 
then examine the text in the window to see what the user is 
pointing at. These operations are easily encapsulated in sim¬ 
ple shell scripts, an example of which is given below. 

The interface seen by programs 

As in 8'/:, the Plan 9 window system |Pike9l], help 
provides its client processes access to its structure by present¬ 
ing a file service, although help’s file structure is very dif¬ 
ferent. Each help window is represented by a set of files 
stored in numbered directories. The number is a unique 
identifier, similar to Unix process id’s. Each directory con¬ 
tains- files such as tag and body, which may be read to 


recover the contents of the corresponding subwindow, and 
ctl, to which may be written messages to effect changes 
such as insertion and deletion of text in contents of the win¬ 
dow. The help directory is conventionally mounted at 
/mnt/help, so to copy the text in the body of window num¬ 
ber 7 to a file, one may execute 

cp /mnt/help/7/body file 
To search for a text pattern, 

grep pattern /mnt/help/7/body 

An ASCII file /mnt/help/index may be examined to 
connect tag file names to window numbers. Each line of this 
file is a window number, a tab, and the first line of the tag. 

To create a new window, a process just opens 
/mnt/help/new/ctl, which places the new window 
automatically on the screen near the current selected text, and 
may then read from that file the name of the window created, 
e.g. /mnt/help/ 8 . The position and size of the new win¬ 
dow is chosen by help. 

Another example 

The directory' /help/cbr contains the C browser we 
used above. One of the programs there is called decl; it 
finds the declaration of the variable marked by the selected 
text. Thus one points at a variable with the left button and 
then executes decl in the window for the file 
/help/cbr/stf. Help executes /help/cbr/decl 
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using the context rules for the executed text and passes it the 
context (window number and location) of the selected text 
through an environment variable, helpsel. 

Decl is a shell script, a program for the Plan 9 shell, 
rc [Duff90]. Here is the complete script: 

eval '{help/parse -c} 
x-'{cat /mnt/help/new/ctl} 

{ 

echo a 

echo $dir/' Close!' 

} I help/buf > /mnt/help/$x/ctl 
{ 

cpp $cppflags $file I 

help/rcc -w -g -i$id -n$line I 
sed lq 

} > /mnt/help/$x/bodyapp 

The first line runs a small program, help/parse, that 
examines $helpsel and establishes another set of environ¬ 
ment variables, file, id, and line, describing what the 
user is pointing at. The next creates a new window and sets x 
to its number. The first block writes the directory name to the 
tag line: the second runs the C preprocessor on the original 
source file (it should arguably be run on, say, 
/mnt/help/8/body) and passes the resulting text to a 
special version of the compiler. This compiler has no code 
generator; it parses the program and manages the symbol 
table, and when it sees the declaration for the indicated 
identifier on the appropriate line of the file, it prints the file 
coordinates of that declaration. This appears on standard out¬ 
put, which is appended to the new window by writing to 
/mnt/help/$x/bodyapp. The user can then point at the 
output to direct Open to display the appropriate line in the 
source. (A future change to help will be to close this loop 
so the Open operation also happens automatically.) Thus 
with only three button clicks one may fetch to the screen the 
declaration, from whatever file in which it resides, the decla¬ 
ration of a variable, function, type, or any other C object. 

A couple of observations about this example. First, 
help provided all the user interface. To turn a compiler into 
a browser involved spending a few hours stripping the code 
generator from the compiler and then writing a half dozen 
brief shell scripts to connect it up to the user interface for dif¬ 
ferent browsing functions. Given another language, we 
would need only to modify the compiler to achieve the same 
result. We would not need to write any user interface soft¬ 
ware. Second, the resulting application is not a monolith. It 
is instead a small suite of tiny shell scripts that may be tuned 
or toyed with for other purposes or experiments. 

Other applications are similarly designed. For exam¬ 
ple, the debugger interface, /help/db, is a director}' of ten 
or so brief shell scripts, about a dozen lines each, that connect 
adb to help. Adb has a notoriously cryptic input language; 
the commands in /help/db package the most important 
functions of adb as easy-to-use operations that connect to the 
rest of the system while hiding the rebarbative syntax. People 
unfamiliar with adb can easily use help's interface to it to 
examine broken processes. Of course, this is hardly a full- 
featured debugger, but it was written in about an hour and 


illustrates the principle. It is a prototype, and help is an 
easy-to-program environment in which to build such test pro¬ 
grams. A more sophisticated debugger could be assembled in 
a similar way, perhaps by leaving a debugging process resi¬ 
dent in the background and having the help commands send 
it requests. 

Discussion 

Help is a research prototype that explores some ideas 
in user interface design. As an experiment it has been suc¬ 
cessful. When someone first begins to use help, the profu¬ 
sion of windows and the different ground rules for the user 
interface are disorienting. After a couple of hours, though, 
the system seems seductive, even natural. To return at that 
point to a more traditional environment is to see how much 
smoother help really is. Unfortunately, it is sometimes nec¬ 
essary to leave help because of its limitations. 

The time is overdue to rewrite help with an eye to 
such mundane but important features as undo, multiple win¬ 
dows per file, the ability to handle large files gracefully, sup¬ 
port for traditional shell windows, and syntax for shell-like 
functionality such as I/O redirection. Also, of course, the 
restriction to textual applications should be eliminated. 

One of the original problems with the system — inade¬ 
quate heuristics for automatically placing windows — has 
been fixed since the first version of this paper. The rule it fol¬ 
lows is first to place the new window at the bottom of the col¬ 
umn containing the selection. It places the tag of the window 
immediately below the lowest visible text already in the col¬ 
umn. If that would leave too little of the new window visible, 
the new window is placed to cover half of the lowest window 
in the column. If that would still leave too little visible, the 
new window is positioned over the bottom 25% of the column 
and minor adjustments are made so it covers no partial line of 
existing text, which may entail hiding some windows entirely. 
This procedure is good enough that I haven't been encour¬ 
aged to refine it any further, although there are probably 
improvements that could still be made. A good rule to follow 
when designing or tuning interfaces is to attend to any clumsi¬ 
ness that draws your attention to the interface and distracts 
from the job at hand. I believe the heuristic for placing win¬ 
dows is good enough because I don't notice it; in fact I had to 
read the source to help to recall what it w-as. 

Help does not exploit the multi-machine Plan 9 envi¬ 
ronment as well as it could. The most obvious example is 
that the applications run on the same machine as help itself. 
This is probably easy to fix: help could run on the terminal 
and make an invisible call to the CPU server, sending 
requests to run applications to the remote shell-like process. 
This is similar to how nmake [Fowl90] runs its subprocesses. 

If imitation is the sincerest form of flatter}-, the design¬ 
ers of Oberon’s user interface will (I hope) be honored by 
help. But Oberon has some aspects that made it difficult to 
adapt the user interface directly to UNIX-like systems such as 
Plan 9. The most important is that Oberon is a monolithic 
system intimately tied to a module-based language. An 


Graphics Interface ’92 



293 


Oberon tool, for instance, is essentially just a listing of the 
entry points of a module. In retrospect, the mapping of this 
idea into commands in a Unix directory may seem obvious, 
but it took a while to discover. Once it was found, the idea to 
use the directory name associated with a file or window as a 
context, analogous to the Oberon module, was a real 
jumping-off point. Help only begins to explore its 
ramifications. 

Another of Oberon’s difficulties is that it is a single- 
process system. When an application is running, all other 
activity — even mouse tracking — stops. It turned out to be 
easy to adapt the user interface to a multi-process system. 
Help may even be superior in this regard to traditional shells 
and window systems since it makes a clean separation 
between the text that executes a command and the result of 
this command. When windows are cheap and easy to use 
why not just create a window for every process? Also, 
help’s structure as a Plan 9 file server makes the implemen¬ 
tation of this sort of multiplexing straightforward. 

Help is similar to a hypertext system, but the connec¬ 
tions between the components are not in the data — the con¬ 
tents of the windows — but rather in the way the system itself 
interprets the data. When information is added to a hypertext 
system, it must be linked (often manually) to the existing data 
to be useful. Instead, in help, the links form automatically 
and are context-dependent. In a session with help, things 
start slowly because the system has little text to work with. 
As each new window is created, however, it is filled with text 
that points to new and old text, and a kind of exponential con¬ 
nectivity results. After a few minutes the screen is filled with 
active data. Compare Figure 4 to Figure 11 to see snapshots 
of this process in action. Help is more dynamic than any 
hypertext system for software development that I have seen. 
(It is also smaller: 4300 lines of C.) 

The main area where help has not been pushed hard 
enough is, in fact, its intended subject: software development. 
The focus has been more on the user interface than on how it 
is used. One of the applications that should be explored is 
compilation control. Running make in the appropriate direc¬ 
tory is too pedestrian for an environment like this. Also, for 
complicated trees of source directories, the makefiles 
would need to be modified so the file names would couple 
well with help's way of working. Make and help don’t 
function in similar ways. Make works by being told what tar¬ 
get to build and looking at which files have been changed that 
are components of the target. What’s needed for help is 
almost the opposite: a tool that, perhaps by examining the 
index file, sees what source files have been modified and 
builds the targets that depend on them. Such a program may 
be a simple variation of make — the information in the 
makefile would be the same — or it may be a whole new 
tool. Either way, it should be possible to tighten the binding 
between the compilation process and the editing of the source 
code; deciding what work to do by noticing file modification 
times is inelegant. 


There have been other recent attempts to integrate a 
user interface more closely with the applications and the oper¬ 
ating system. ConMan and Tel [Haeb88,Oust90] are note¬ 
worthy examples, but they just provide interprocess commu¬ 
nication within existing environments, permitting established 
programs to talk to one another. Help is more radical. It 
provides the entire interface to the screen and mouse for both 
users and programs. It is not an extra layer of software above 
the window system; instead it replaces the window system, 
the toolkits, the command interpreter, the editor, and even the 
user interface code within the applications. 

Perhaps its most radical idea, though, is that a better 
user interface can be one with fewer features. Help doesn’t 
even have pop-up menus; it makes them superfluous. It has 
no decorations, no pictures, and no modes, yet by using only 
a bitmap screen and three mouse buttons (one of which is 
underused) it provides a delightfully snappy and natural user 
interface, one that makes regular window systems — includ¬ 
ing those I have written — seem heavy-handed. Help 
demonstrates that the ideas of minimalism, uniformity, and 
universality have merit in the design of human-computer 
interfaces. In the years to come, as the machines and their 
input methods become more complex, those principles will 
have to be followed ever more assiduously if we are to get the 
most from our systems. 
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Abstract 

In this article, we present a model for building interactive 
graphical applications based on multi-layer graphics. With this 
model, a variety of components of an interactive graphical 
editor (transient objects like the selection rectangle, selected 
objects, grids, cursors), as well as the handling of input events, 
are realized in a more straightforward way. 

We show how an interactive graphical application can use 
layered graphics in a simple and effective way to represent 
direct manipulation, some graphic constraints and event 
handling using various input devices. By clarifying the status 
of abstract elements of an interactive application, the model 
encompasses a significant part of the dynamic aspect of the 
interaction. 


R£sum6 

Dans cet article, nous pr6sentons un module bas£ sur le 
graphique multi-couches pour le developpement d’applications 
graphiques interactives. En utilisant ce modele, il est possible 
d’attribuer un statut clair k de nombreux dements composant 
un yditeur graphique interactif (les objets «fugaces» comme le 
rectangle utilisd pour s&ectionner, les objets s£lectionn6s, les 
grilles d’alignement ainsi que les curseurs). 

Nous montrons comment une application graphique interactive 
peut tirer profit du modele graphique multi-couches pour 
d6crire simplement la manipulation directe, certaines 
contraintes graphiques ainsi que la gestion des 6v6nements 
produits par divers pdriph^riques d’entr6e. En attribuant un 
statut clair & ces composantes abstraites des applications 
graphiques interactives, le module aide k la construction d’une 
partie importante de la description du comportement 
dynamique de 1* interaction. 

KEYWORDS: User Interface Design, Interface Metaphors, 
Input Devices. 


Introduction 

Building interactive graphical programs is still a difficult task. 
Some toolkits [14,11] propose a framework for building task- 
specific editors, but the kind of editors they can produce are 
stereotypical in several aspects: 

• they only consider keyboard and mouse input, 

• their graphic model is tied to the PostScript model, 

• their interaction is based on stereotypical graphic objects like 

button boxes and menus. 


* they offer only stereotypical mouse- and keyboard-based 
manipulations. 

We believe that these limitations are currently too strong 
because of new, easily available input devices; both 2D 
(Wacom pressure sensitive styli, touch screens with multi 
touch sensitivity, eye trackers) and 3D (data gloves, flying 
mice, 3D styli). Since the list of devices recognized by current 
toolkits is “hardwired” into the toolkit, almost no support is 
provided for application programmers to handle these new 
kinds of devices for input and to support graphic feedback for 
their actions. 

We propose here a model based on multi-layered graphics and 
multi-layered input handling. 

This model is also suitable for describing traditional mouse- 
and keyboard-based interactive programs, simplifying the 
handling of interaction, the handling of transient shapes, of 
selected graphic objects, of constraint representations (e.g. a 
grid) and of device feedback. 

Most graphical toolkits distinguish two levels for displaying 
graphics: a virtual surface where the graphics are drawn and 
one or several viewports which show some part of the virtual 
surface (see Fig. 1). 

In this model, the graphics on the virtual surface are produced 
by a graphic controller, a module which transforms an 
abstract data structure into a graphic data structure which can 
be displayed. Passing from the graphic data structure to the 
virtual surface is done by a Tenderer module (see fig. 2). 
Projecting the virtual surface onto a visible viewport is done 
through a transformation and a clipping zone. This model is 
used by most graphical systems and the mechanisms to 



Virtual surface Visible Viewport 

figure 1: the two levels of graphics in toolkits 
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management is done by an internal part of the toolkit. This 
internal part is hard to adapt to new interaction and appearance 
styles. 


The Multi-Layered Graphic Model 

We propose a model where the visible surface can display 
several virtual surfaces, like stacked transparent celluloids (see 
figure 3); a virtual surface supports a graphical structure. The 
same virtual surface can be shown on several different visible 
surfaces, at different levels. This is an extension to the multi¬ 
view concept. 

This model is useful to specify both the display of graphics 
and the handling of events. Layers are drawn from the bottom 
(background) to the top (foreground) whereas events are 
handled from the topmost layer to the bottommost. In the 
following sections, we discuss the use of layers in typical 
applications, the handling of graphics, event handling, the 
status of various components in our model and finally an 
example of its use. 


implement it can be found in windowing systems like X [12], 
the Macintosh Toolbox [2], and by graphical toolkits like 
Interviews [9], ET++ [15] or Garnet [11]. 

This model is well adapted to the display of graphics but is of 
little help for interaction. For instance, transient objects, like 
handles or sliding shapes, have no clear graphic status. They 
do not belong to the graphic data structure so their display and 



Layers 

While using the Xtv toolkit developed at LRI [3], experiments 
with several interactive graphical layouts led us to distinguish 
between the following layers: 

• Background!model layer : this layer displays a background 

image. Conceptually, areas covered by no other layer 
contain the background image. Window systems usually 
handle this layer in a special way. X, for instance, has a 
background image (generally a solid color) for each 
window. In programs like Illustrator™, this layer can 
contain a drawing which is then used as a template for the 
main drawing, like with tracing paper. In our model, this 
layer shows “default" graphics and handles events ignored 
by the other layers. 

• Graphical constraints visualization layer: this layer usually 

displays a grid or other graphical formalism for 
representing geometrical constraints. 

• Application data layer: this layer, as in figure 1, displays the 

graphical objects representing the application’s internal 
objects. Event handling in this layer is described later. 

• Selected objects layer : this layer displays the selection, using 

shapes ( e.g. handles) expressing the kinds of 
manipulations available. Event handling in this layer 
supports direct manipulation. 

• Lexical operations representation layer: this layer displays 

shapes expressing the status of input devices, like cursors, 
as well as transient shapes {e.g. “zoom animation".) 
Obviously, this list is not exhaustive; other layers can be found 
for specific applications. It does, however, apply to many 
applications (MacDraw, Illustrator, Idraw, etc.) 

The model can manage many visible surfaces (multi-views): 
each virtual surface can be seen by any number of visible 
surfaces and each visible surface can display a virtual surface 
at any level. 

Hence, the selection in a given view can differ from the 
selection in another view. The constraint grid can appear in 
one view and not in another, and be different in a third. A view 
can use a visible surface as a model whereas another uses it as 
the application data layer. See the example application at the 
end of the paper. 

The Transformation Model 

As explained in the previous section, the display of a virtual 
surface on a visible surface is done with a transformation and 
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Figure 4: composition of structured graphic objects. 


In order to keep the simplicity of this mechanism with the 
multi-layer model, graphic objects should be able to actively 
control their drawing process to bypass the standard 
visualizing transformation. As explained in the previous 
paragraph, the redisplay algorithm calls a drawing function 
with a region to draw as an argument. In our model, the 
visualization transformation should also be passed as a 
parameter to the drawing function of each layer. 

It should be noted that when a graphic structure is modified 
and the redisplay algorithm is notified that an area should be 
updated, the area cannot be computed just once for the abstract 
surface and propagated to each visible surface where it 
appears, because, here again, the treatment of the 
transformation has to be taken into account for each visible 
surface in order to compute the exact area. 

The lazy redisplay mechanism used by the Macintosh toolbox, 
as well as Interviews, ET++ and Xtv, can be easilly adapted to 
our model. 

Note that no assumtions are made about the graphic model 
used to draw in the application data layer. The multi-layered 
model can be used with bitmap graphics with an alpha value, 
enhanced bitmap graphics (supporting zoom), bitmap graphics 
with some structured primitives (QuickDraw [2], X), device- 
independant 2 1/2D painting (PostScript [1]) or 3D (PHIGS). 


through a clip. The graphic structure is displayed on the visible 
surface analogeously to the way structured graphics packages 
(GKS [6], PHIGS [7]) build complex objects: by composition 
of simple objects, geometrically transformed (see figure 4). 

In the mono-surface model, the visible surface is responsible 
for scrolling and zooming in 2D, or changing the view point in 
3D (these operations are sometimes called non-semantic 
manipulations.) The transformation is therefore associated to 
the visible surface. 

Our model retains similar properties but requires that some 
objects are aware of the transformation applied to them and 
can manage it explicitly when being drawn on the visible 
surface. Specifically, objects can behave in three ways to 
redraw themselves with a specific transformation: 

• use the transformation for all components uniformly, 

• use the transformation for positioning but not for dimensions 

(lines width, handles width,...), 

• don’t use the transformation at all. 

By construction, the application data layer uses the trans¬ 
formation for all components, since the transformation model 
has been chosen for this purpose. The selection and grid 
objects use the second behavior; they usually draw lines using 
a constant width of one pixel. The third is sometimes used to 
present graduated rules or display information like locator 
coordinates. 

Implementation Issues 

With the mono-layered model, most toolkits use a “lazy” 
redisplay mechanism: when a part of the visible surface should 
be updated, a function is called for the visible surface with the 
region to repaint as an argument. The toolkit is then 
responsible for redrawing the region. This mechanism is used 
either because that part has been made visible or because the 
graphics under it have changed ( i.e . an object is created, 
modified or removed.) In the latter case, the data structure is 
updated and, for each visible surface where the virtual surface 
appears, the region to update is cleared and the redisplay 
mechanism is triggered. 


Handling of 3D 

Note also that the model is, to some extent, suitable for 
interactive manipulation of 3D graphics. Most 3D editors use a 
“wire frame” representation of 3D structures. The vertices are 
displayed using a transformation and manipulated using some 
2D or 3D locator. The virtual surface contains 3D objects 
(virtual volume would be a better name) and the visible 
surfaces present a projection — through a visualizing 
transformation — of the virtual volume. The list of layers 
given for a conventional 2D graphical application still applies 
for a 3D editor: 

• the background usually displays a solid color (either black or 

grey), 

• the graphical constraints visualization layer can display a 3D 

grid or a trajectory, 

• the application data layer displays the 3D objects, 

• the selected objects layer displays handles of selected 

vertices or objects, 

• the lexical operations representation layer displays the 

representation of the input device (the projection of a 3D 
mouse for instance). 

The model cannot be used when objects are edited with their 
hidden surfaces removed because the cursor and selection 
should be consistent with the 3D structure. This case is, 
however, very unusual. 

Handling of Events 

A major benefit of multi-layer graphics is the simplicity of 
interaction handling. Each layer handles only those events for 
which it has expressed interests; control is distributed. When it 
receives an event, a layer can either: 

• ignore it and pass it to layers below, 

• handle it, 

• handle it, then pass it to layers below, 

• handle it, transform it and then pass it to layers below. 

For example, figure 5 describes how events are handled by 
each layer if the selection mode works as follows: 
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Layer 

Event 

Behavior 

Lexical: 

PointerMoveDown 

If a selection rectangle 
exists on the lexical 
layer, set its moving 
comer to the mouse 
position. 

Else, pass the event. 


PointerUp 

If a rectangle exists on 
the lexical layer, delete 
the rectangle, then, pass 
a new event called 
SelectRect containing 
the rectagle boundings. 
Else, pass the event 

Selection: 

PointerMoveDown 

If the selection was 
being moved, move it 
with the pointer. 

If it was not moving and 
a ghost is under the 
pointer, move it with 
the pointer. 

Else, pass the event 

Data: 

PointerDown 

If an object is under the 
pointer, select it (create 
a ghost for it). 

Else, pass the event 


SelectRect j 

Select the objects inside 
the rectangle. 

Background: 

PointerDown 

Clear the selection, then, 
create a rectangle on the 
lexical layer at the 
mouse position. 


Figure 5: description of the selection 


• When the mouse is clicked outside any object, the selection 

is cleared. 

• If the mouse is then dragged, a selection rectangle is drawn 

and follows the mouse. 

• If the mouse button is released, objects inside the rectangle 

are selected and the rectangle is deleted. 

• When the mouse is clicked on a graphic object, it becomes 

selected. 

• If the mouse is clicked over a selected object and then 

dragged, a ghost* of the selection follows the pointer. 

• When the mouse buttons are released, the ghosts are deleted 

and the objects they represent are moved to the new 
position. 

Figure 5 also shows the declarative aspect of event handling 
with our model. Note that the selection rectangle is created by 
the background layer, because it is a default action, performed 
when no other layer handles the event more specifically. 

For exotic devices, the lexical layer can also represent more 
sophisticated feedback than the traditional cursor. In our 
animation program for instance, we use a pressure sensitive 
wireless Wacom digitizer where the hot spot is represented as 
a cross and the pressure as a circle, with a radius proportional 
to the pressure. A sketching program using the digitizer to 
draw like a pencil would simply add some actions to the layers 
above (see figure 6). 


1 We call ghost a transient graphic object which represents the 
selection of a graphic object within the main data layer. 


Layer 

Event 

Behavior 

Lexical: 

DigitMoveUp 

Draw a cross (in Xor) 
centered at the hot spot 
and pass the event. 


DigitDown 

Draw the cross and put 
the position and the 
pressure in a new list 


DigitMoveDown 

Erase the cross, draw a 
line from the previous 
point to the current poin 
using the current 
pressure to compute its 
width, draw the cross at 
the current point and 
add the point and 
pressure to the list. Pass 
the event 


DigitUp 

Erase the cross, mark 
the area where the lines 
were traced as an area to 
redisplay, then, pass a 
new event called 
DigitTrace containing 
the list 

Data: 

DigitTrace 

Create an object with 
the list of points and 
select it 


Figure 6: adding actions for a pressure sensitive digitizer. 


Logical Status of Layers 

Most toolkits use a model inspired from the Smalltalk Model- 
View-Controller [8] (MVC) model: ET++ uses MVC, Inter¬ 
views calls it Subject-View, Garnet uses a one-way constraint 
system. The idea of these models is to distinguish between 
object and representation and to provide a mechanism to keep 
them coherent. The PAC (Presentation, Abstraction, 
Controller) model of [4] is a generalization of this concept of 
separation. By using it, we can further clarify the notion of 
layer in our model. 

In graphical applications, some abstract structure is to be 
displayed (see fig. 2). The graphic structure is considered as 
the presentation of the abstract structure. The graphic 
controller is responsible for keeping the graphical structure 
coherent with the abstract structure. It is also responsible for 
receiving input events and handling them according to the 
graphic semantics of the application. In most interactive 
graphic applications, the controller of this layer performs only 
"hit detection", that is, receives input events and determines 
which object they designate. The manipulation - either direct 
or through menus - is usually performed on selected objects. 
With our model, we can describe the relationship between the 
graphic structure and the representation of its selection (its 
ghost) by a similar process (see fig. 7). The handles 
representing the selection are a presentation of the state “being 
selected” of the graphic structure. 

As shown in the previous section, the ghost appears in the 
selected objects layer. In our model, the ghost is considered as 
a view of the graphical data structure representing the object in 
the main data layer. Most of the time, only geometric 
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Presentation 


Abstraction 


Figure 7: relationship between a graphic and its selection. 


information is required to draw a ghost. The graphical aspect 
of the ghost can therefore be defined independently of the 
application. Of course, in an object-oriented implementation, 
the ghost may be specialized to display some application- 
specific information. 

The controller is responsible for maintaining the graphical 
coherence between the graphic structure of the main data layer 
and its ghost. It also interprets input events and handles direct 
manipulation. During direct manipulation, the controller 
maintains a corresponding graphic echo and can try to affect 
the manipulation by constraining input events (e.g. mouse 
move). The constraints here can be either lexical, syntactic or 


Data layer 



Figure 8: PAC relations between layers. 


semantic, depending mainly on performance issues. Lexical 
constraints (like grid alignment constraints) are independant of 
the graphic data structures and of the semantics of the abstract 
data structure beeing manipulated. Syntactic constraints (like 
non-overlapping constraints) depend on the graphical structure. 
Semantic constraints (like hilighting only valid targets when 
interactively connecting two components of a graphic 
structure) depend on the graphic structure and the semantic 
structure. 

Once the manipulation is terminated, the controller can modify 
the abstract data structure or ignore the effects if the 
manipulation is not considered valid. 

This above structuration shows a recursive PAC organization 
(see figure 8). 

The PAC model is also applicable to the graphical constraints 
visualization layer, which displays a presentation of some 
constraints. For example, if we use a simple grid alignment, 
the abstacdon is composed of four values: the grid spacing and 
the offset from the origin. From this abstraction, the grid can 
be displayed. The abstraction is then used during the direct 
manipulation phase to align the input device events to grid 
values (see figure 9). 

Finally, with the PAC model, the lexical operations 
representation layer displays some presentation of device 
states (the mouse cursor, for instance). 

Each layer contains the presentation and handles the 
interaction for a specific category of abstractions. 

Example of Use in an Application 

With the multi-layer model, we have built a computer aided 
cartoon animation editor. The Unidraw toolkit, which offers a 
rich set of objects, has been modified for this purpose. 

An animator can use both a mouse and a pressure sensitive 
Wacom stylus with multiple levels of pressure. The layers it 
uses are shown in figure 10. 






9 t 



Presentation 



Abstraction 


Figure 9: PAC model for a grid alignment constraint. 
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Figure 11: Interface of an Editor using the Layers model. 


Background Grid Drawing Selection Digitizer echo Lexical feedback 



Figure 10: layers of a 2D animation graphic editor. 


Compared with a conventional 2D graphic editor, we have 
added the Digitizer echo layer which displays a cross centered 
at the hot spot of the stylus, with a width equal to the selected 
drawing line width. When the stylus touches the digitizer, a 
circle appears with a radius proportional to the pressure 
applied, as shown in figure 11. 

The graphic editor can show multiple views of the same 
drawing (in the main data layer). Each view also contains the 
Digitizer echo layer. This feature is useful when working on a 
small area of a drawing, which is usually zoomed on one view 
but not on another. The cursor provides the visual insurance 
that the two views display the same drawing. 

The background layer can contain other drawings, greyed out. 
This feature is meant to emulate the use of tracing paper. 

Related Work 

The multi-layer model is not completely new; previous 
systems have used multiple layers for graphic output or event 
handling. However, no model generalizes this notion as ours 
does. 

Some window systems offer support for stacked windows, 
either for graphic output or for event handling. NeWS [13] has 
a special type of window called “overlay canvas” which is 
used to display transient data. Overlay canvases optimize the 
redisplay by relaxing the drawing model of PostScript. They 
usually use either an overlay plane of the screen on the root 
window when the hardware offers such a device, or draws all 


the shapes using Xor Taster operations. Overlay canvases offer 
just one level of layering, which is a strong limitation. 

The X window system offers transparent input-only windows 
for event handling. As their name implies, no graphic output 
can be done on such windows. 

The HyperCard [5] system has a two layer model for handling 
graphics and events, which is very close to our model. 
However, HyperCard is not extensible and can not be 
considered as a complete toolkit. 

It is also interesting to notice that some real devices do provide 
multi-layered graphics, like heads-up displays on military 
planes. 

Future work 

We are currently working on a graphic editor for designing 
interactive graphical applications, using this model to express 
event handling graphically. 

Conclusion 

The multi-layered multi-view model simplifies both the 
management of graphical output and the description of event 
handling for interactive graphical applications. It clarifies the 
realization of objects appearing in a graphical application, like 
selection, cursor, grid or selection rectangle. It also permits a 
clear description of event handling for a variety of input 
devices. We believe this model can unify the handling of a 
variety of problems that are currently solved with ad hoc 
approaches. 
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Abstract 

Constraints provide a natural formalism for user-interface 
design and graphical layout. Recent results and algo¬ 
rithms from symbolic computation and geometry pro¬ 
vide new techniques to manipulate linear arithmetic con¬ 
straints. We show how these results can be applied to in¬ 
teractive graphical user-interfaces and how they extend 
the capabilities of previous interactive constraint-based 
user interface systems. We propose an architecture for 
such systems based on these techniques. 

Keywords: Linear Arithmetic Constraints, Interactive 
Techniques 

1 Introduction 

Constraints are a natural formalism for specifying user- 
interface design and graphical layout. They have shown 
their utility in interactive systems [1, 2, 3, 11, 12, 13, 14, 
15], in graphic specification languages [5, 17] and recently 
in visual language parsing [4]. 

The requirements of adequate response time and graphi¬ 
cal feedback for interactive systems demand certain capa¬ 
bilities of constraint solvers. In particular, previous work, 
for example ThingLab and ThingLab II [2, 11] suggests 
that constraint solvers for interactive graphics must pro¬ 
vide: 

• Incremental addition and deletion of constraints. 

• Fast generation of plans of execution when the ob¬ 
ject that is the focus of manipulation changes. 

• Adequate feedback bandwidth when manipulating 
a graphic object. 

Other capabilities which constraints can provide and which 
an interactive constraint-based system should support 
are: 

• Determining and presenting the range of values 
that a variable can take. For example, if a point 
is going to be dragged around on the screen, the 


system should be able to present graphically to the 
user whereabouts this point can be moved. 

• Support the definition and compilation of com¬ 
pound objects. Although, this is straightforward in 
systems without constraints, the addition of con¬ 
straints raises new issues. In particular, efficiently 
representing the constraints in the compound ob¬ 
ject, and determining which variables of a com¬ 
pound graphic object define all objects it contains. 

In addition the constraint solver must support other, 
more usual, operations associated with constraints. These 
include: 

• Detecting that a system of constraints is unsatis- 
fiable, and identifying which constraints must be 
removed to restore satisfiability. 

• Detecting an under-constrained system and identi¬ 
fying which variables must be further constrained. 

Both the constraints causing unsatisfiability, and vari¬ 
ables that need to be further constrained must be indi¬ 
cated to the user. 

Recent results and algorithms from symbolic computa¬ 
tion [6, 8, 9] provide powerful techniques to manipu¬ 
late sets of linear arithmetic constraints containing both 
equalities and inequalities. We show that when applied to 
interactive constraint-based user-interface systems, these 
techniques give new capabilities, and enhance interaction 
through improved user-feedback. In particular, they pro¬ 
vide: 

1. A canonical form for a set of linear arithmetic con¬ 
straints. The canonical form is concise, does not 
contain redundant constraints, identifies the de¬ 
grees of freedom in the constraints, and makes ex¬ 
plicit the equalities implied by the constraints. 
There is an incremental algorithm to compute the 
canonical form. This algorithm also determines 
whether or not the constraints are satisfiable. The 
canonical form is a key for both efficient represen¬ 
tation and manipulation of constraints. 
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2. A parametric solved form for the solutions of a set 
of constraints in terms of distinguished paramet¬ 
ric variables. The solved form, which corresponds 
to the plans of Thinglab II, allows the values of 
variables to be computed rapidly from the param¬ 
eters. This permits the rapid re-satisfaction of the 
constraints when objects are being manipulated by 
the user. 

3. An efficient projection algorithm to compute the 
range of values of distinguished variables. This 
allows the user to get feedback about the region 
within which selected objects on the screen can be 
moved while still satisfying the constraints. Pro¬ 
jection also plays a role in computing the manipu- 
lable interface of compound objects. 

4. Techniques to deal with unsolvable sets of con¬ 
straints by identifying minimal unsolvable subsets. 
These provide feedback as to which constraints need 
to be relaxed or removed to restore solvability. 

5. Techniques to determine if a set of constraints is 
underconstrained, that is if some variables cannot 
be uniquely determined in terms of certain distin¬ 
guished parameters, and consequently which vari¬ 
ables must be further constrained. 

This work extends previous constraint technology for in¬ 
teractive constraint-based systems in two ways. 

First, we present new feedback mechanisms and inter¬ 
action styles for constraint-based systems. These are 
based on new techniques for extracting information about 
causes of over- and under- constrainedness in sets of con¬ 
straints, and determining relationships on specific vari¬ 
ables implied by the constraints. These techniques rely 
on manipulating constraints symbolically. To our knowl¬ 
edge these features have not appeared in previous sys¬ 
tems. 

Secondly, we extend previous work that uses symbolic 
techniques to solve constraints by allowing simultane¬ 
ous linear equations and inequalities. This extension is 
important because such constraints arise naturally when 
specifying graphical layout. For efficiency reasons, pre¬ 
vious systems have dealt mainly with systems of acyclic 
linear equations [3, 11, 15], and solved these constraints 
using local propagation techniques. Recent research has 
addressed the efficient and incremental recompilation of 
these types of constraints in response to user interac¬ 
tion [11]. For the more general constraints considered 
here, however, local propagation is not sufficient; global 
techniques must be used. We note that Witkin [16] and 
Nelson [12] deal with more powerful constraints, but use 
numerical techniques for constraint satisfaction. It is not 
clear how the feedback mechanisms we present can be 
provided using these numerical techniques. 

The rest of the paper is organized as follows. In the 
next section, we present an example of a typical inter¬ 
active session which illustrates these new techniques. In 


Section 3, we discuss two key elements of the underly¬ 
ing constraints technology: a canonical representation for 
constraint, and a new projection algorithm particularly 
suited to this application. In Section 4, we propose an 
architecture for a constraint manipulation sub-system to 
be used within an interactive user interface system. In 
Section 5 we present some empirical results concerning 
the performance of this constraint technology. 

2 Example 

To illustrate some of the capabilities of the constraint 
technology, we present a simple hypothetical session with 
an interactive constraint-based editor. 

Consider the diagram illustrated in Figure 1 which con¬ 
sists of two pieces of text T1 and T2, two rectangles ill 
and R2 and a surrounding box B. Suppose that the user 
wishes to satisfy the following requirements. 

1. ill and R2 have a fixed aspect ratio. 

2. ill and il2 have the same size and cannot overlap. 

3. Ti is centered in Ri and Ri is large enough to con¬ 
tain Ti. 

4. B contains both ill and il2, and they are “nicely” 
placed inside B % that is the rectangles are equidis¬ 
tant from the borders of B and from each other. 


Textl 


Text2 

RI 


£2_ 


B 


Figure 1: Layout of the diagram 

To create this diagram, the user adds and deletes graphic 
objects and constraints and modifies the parameters (or 
attributes) of the objects. The resulting layout is de¬ 
fined by a set of constraints. These fall into the follow¬ 
ing categories: local constraints, which express the rela¬ 
tionships between the modifiable parameters of system- 
defined objects; global constraints, provided by the user, 
which express the relations between objects; and an¬ 
chor constraints which express that certain points or at¬ 
tributes are fixed. 

Typically, local constraints are pre-defined for each ob¬ 
ject, and are highly redundant to allow multiple ways to 
define an object. In our example, each rectangle may be 
defined by its opposite vertices or its center and a vertex. 
Thus each rectangle has as parameters, its four vertices - 
bl t br, ul, ur for bottom-left, bottom right, etc...; its cen¬ 
ter point c, and its extent e in x and y dimensions. The 
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local constraints for an object O over these parameters 
are 

Oca = 0.5 * Oblm + 0.5 * Obra Obry = Obly 

Ocy — 0.5 * Obly + 0.5 * O u ly Oula — Ofala 

Obra ** Obia “f" Oea Oura ~ Obra 

— ^bly "b Oey Oury — Ouly 

OcX ^ 0 0 «y ^ 0 . 

Global constraints are provided by the user and are de¬ 
fined over the parameters of the relevant objects. For 
example, the requirement that T1 is contained in #1 is 
expressed by 

Tibia ^ Rlblsc 
Tlbly > Rlbly 
Tlura < R1 urx 
riury £ Rlvry* 

The entire set of local and global constraints which cap¬ 
ture the previous 4 requirements for Figure 1 is shown in 
Figure 5. The large number of constraints generated by 
this relatively simple example is typical - systems such 
as ThingLabll generate similar numbers of constraints 
for similarly sized examples. The large numbers of con¬ 
straints means that efficient representation and manipu¬ 
lation of constraints is important. 


about the current constraints and presents this in a suit¬ 
able graphical format. For instance, suppose the user 
wants to move the upper-right corner of Rl. Initially the 
vertices of B , one vertex of R2 and the text size are “an¬ 
chored” so that the system will not change their present 
values (see Figure 2). At this point the set of constraints 
is over-constrained. When the user selects the upper- 
right corner of Rl, the system indicates that #1 cannot 
be moved. This is because the constraints imply that the 
coordinates of Rl are fixed. Note that this information 
is not explicit in the original constraints. 

The system can then indicate which anchor constraints 
need to be removed to restore some degrees of freedom 
in the system. If the user now removes the anchor- 
constraint on R2, it is now possible to automatically in¬ 
fer, from the constraints, the possible values for HI. The 
possible values are given by the constraints 

■ftlyra — j'Hlury — 0 

110 < Rlury < 150 

defining the line segment as depicted in Figure 3. The 
system displays this line and the cursor is constrained to 
remain on it. Whenever the cursor is moved, the display 
is updated to reflect the new configuration (Figure 4). 


Anchor constraints are equalities which fix the values of 
variables of objects. For instance, the constraint 

Bbls = Bbly = 0 

expresses that the lower left corner of B is anchored at 
the origin of the screen. Anchor constraints are added 
by the user when the attributes of objects are not to be 
modified. 

When creating this diagram, the user adds and deletes 
graphic objects and constraints, and modifies the param¬ 
eters (or attributes ) of the objects. As constraints and 
objects are added, the solvability of the entire set must 
be checked. If it becomes unsatisfiable, information as 
to which constraints need to be modified to restore solv¬ 
ability is fed back to the user by highlighting a graphical 
representation of the offending constraints. 



Figure 2: Three anchor points were selected 

During a session, as graphic objects are moved around 
and their attributes modified, the system gives feedback 



Figure 3: Range of motion for point Rl ur 



Figure 4: Display reflecting motion of point Rl ur 
3 The Linear Constraint Technology 

From the previous example, we can see that an underly¬ 
ing constraint technology for interactive constraint-based 
systems should provide efficient: 
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• Incremental addition and deletion of constraints. 

• Detection of unsatisfiability and identification of 
which constraints or anchor constraints are causing 
it. 

• Rapid re-satisfaction of existing constraints when 
a small number of parameters, such as the location 
of a vertex, are changed. 

• Detection of under-constrained system and identi¬ 
fication of which parameters can be fixed to con¬ 
strain it. 

• Recognition of an over-const rained constraint sys¬ 
tem and identification of the responsible anchor 
constraints. 

• Computation of the range of values that a param¬ 
eter can take while leaving the system satisfiable. 

Recent results from symbolic computation provide a tech¬ 
nology for linear arithmetic constraints with these capa¬ 
bilities. The key to an efficient implementation is based 
on a new representation for sets of linear constraints 
called the canonical form [10], and a new algorithm for 
variable projection [9]. We now discuss the canonical 
form, how it supports the compilation of plans of execu¬ 
tion or parametric solved forms, and projection. 

The Canonical Form 

The sets of constraints that arise in interactive systems 
often contain redundancy. Local constraints defining at¬ 
tributes of objects often contain redundancy to allow flex¬ 
ible definition of objects. Constraints on objects can be¬ 
come redundant as a user adds further constraints. Be¬ 
cause of the potentially large number of constraints that 
can be generated in interactive systems, it is important to 
have non-redundant representations. In the technology 
we present, this representation is given by the canonical 
form. 

The canonical form 1 of a set of linear arithmetic con¬ 
straints consists of: 

1. A set of equations that defines the affine hull of 
the solution set of the constraints. This affine hull 
is the space having the smallest dimension that 
contains the set of solutions to the constraints. 

2. A set of inequalities that define the full dimensional 
solution set. 

The canonical form has the following important proper¬ 
ties: it contains no redundant constraints; it identifies 
the degrees of freedom in the constraints; and it makes 
explicit the equalities implied by the constraints. 

l Note: the full definition of the canonical form also in¬ 
cludes negative constraints which are not discussed here. The 
interested reader is referred to the reference for a complete 
treatment. 


Eliminating redundancy is important because typically 
the system of constraints may contain many redundant 
constraints mainly due to the local constraints. For some 
systems of constraints, the corresponding canonical form 
has an order of magnitude fewer constraints. Making 
equalities explicit is important because they can greatly 
simplify the set of constraints. The degrees of freedom 
of the set of constraints is given by the number of vari¬ 
ables less the number of equalities in the canonical form. 
This means it is possible to determine if the system is 
under-constrained and which variables need to be further 
constrained. 

Transforming an arbitrary set of constraints into its canon¬ 
ical form is a complex three-stage process using a quasi¬ 
dual formulation of optimization techniques (such as the 
simplex method) from Linear Programming [7, 10]. The 
three stages are identification of implicit equalities, sim¬ 
plification, and elimination of redundancy. The first stage 
performs a test for satisfiability. If the system of con¬ 
straints is unsatisfiable, then a minimal subset of con¬ 
straints causing unsatisfiability is identified. 

Computing the canonical form from scratch is expen¬ 
sive. However, we use an incremental algorithm that ef¬ 
ficiently recomputes the canonical form when constraints 
are added. This is an important consideration when in¬ 
teractively constructing systems of constraints. 

The system given in Figure 5 has the canonical form 
given in Figure 6. The original system has 54 equalities, 
and 28 inequalities involving 32 variables. The canoni¬ 
cal form has 51 equalities, and 10 inequalities involving 
9 variables. Note the substantially reduced number of 
variables in the inequalities. 

The Parametric Solved Form 

One of the most important operations in an interactive 
graphics system is the manipulation of objects on the 
display. To do this efficiently requires computing a plan 
of execution in terms of the object being manipulated, 
and then continually re-executing this plan. 

In our technology, a plan of execution is called a para - 
metric solved form with respect to a set of parametric 
variables. The parametric solved form is a set of con¬ 
straints such that all dependent variables are expressed 
in terms of the parameters, and only the parametric vari¬ 
ables occur in the inequalities. The parametric variables 
are those that correspond to the object being manipu¬ 
lated. Thus the parametric solved form has the property 
that if the constraints contain the variables ®i, ..,® n , and 

the parameters are ®i.*{, then for j = t+1,n, there 

exists an fj such that xj = ..., Zi). 

The system of constraints corresponding to Figure 4 when 
parameterized by variable #l u1 . y has the parametric solved 
form given in Figure 7. This solved form consists of 59 
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Tl ca = 0.5 * Tibia + 0.5 
Tl cv = 0.5*TUi v + 0.5 

Tlfcra — Tibi® + Tl ao 

* Tlbre 

♦ rui v 

T2brc — jP2bl® + T2. 0 

T2 ce = 0.5 * T2 b je + 0.5 ♦ T2 b r« 
T2.y — 0.5 * T2bly + 0.5 * T2„|y 
Tluly = Tlbly + Tl.y 

T2 uJ y = T2bi v + T2., 

Tlbry — Tlbly 


Tl u |a = Tibi® 

T2bry = T2bly 

T2ui a = T2bi® 

Tluta — Tlbrc 


Tl Ur y = Tl tt Jy 

T2 urx = T2br® 

T2ury = T2uly 

Tibia ^ Albl® 


Tlbly ^ Albly 

T2bi® ^ A2bi® 

r2bly ^ A2bly 

Tl*re £ Alure 


Tl ur y < Al nr y 

T 2 U r* < A2ure 

T2ury < A2 UT y 

Tie — Alee 


Tl C y — Alcy 

T2 ce ~ A2 ca 

T2 C y = A2cy 

T Ice > 0 


Tl.y > 0 

T2.x > 0 

T2.y > 0 

Al c * = 0.5 • Albi. + 0.5 
Al cy = 0.5 * Albi v + 0.5 
Albre — Alble + Al.® 

* Albre 

* Al w iy 

A2br® — A2bts + A2.® 

A2 CS = 0.5 * A2bi® + 0.5 * A2br® 
A2 C y — 0.5 • A2biy + 0.6 * A2uiy 
Alu|y — Albly + Al.y 

A2«iy = A2biy + A2« 

Albry — Albly 


Al*.i B — Albl® 

A2bry = A2bly 

A2ulo — A2bl® 

Al U re = Albre 


Al U ry —■ Altily 

A2 U r» = A2bre 

A2 ury ~ A2ui v 

Albl* ^ 2?b|s 


Albly ^ Ably 

A2bi* ^ Abi® 

A2bly ^ Ably 

Alur» ^ A tt T* 


Al UT y < A-ury 

A2 ure < £ U re 

A2 tt ry £ A ury 

Al«. s 2 * Al. v 

Al.* > 0 


A2e® — 2 * A2«y 

Al.y > 0 

A2. 0 > 0 

A2.y > 0 

Abr* = Abt® + A.® 


Abry — Ably 

A-ure — Abrs 

A U ry = Auly 

Aui® = B bt® 


Auly — Ably + Aey 

A.® > 0 

A.y > 0 

Af x = Alble — Abl® 


— A2ble * Albre 

Afy = Albly — Ably 

My — A«Iy — Aluly 

M v — Bbra A2bre 


M v = A2bly - Ably 

Afy — Auly “ A2uty 

M® > 0, Afy > 0 

3 * Mg, + Al e » + A2. x — 

= 0 


2 * My + Al.y — A.y — 0 

Al.y - A2.y = 0 


Figure 5: Original Set of 82 Constraints 


equations and the two inequalities 

110 < Rlury Rlury < 150. 

One can readily see how all variables are either assigned 
to constants or are expressed solely in terms of Rlury 
Thus, only one degree of freedom remains in the solved 
form. 

The parametric solved form is efficiently derived from 
the canonical form (see Section 5). It can then be either 
interpreted directly or used to compile efficient code to 
compute the values of dependent variables as the values 
of parameters are changed. 

Projection 

Projection provides a technique to examine the relation¬ 
ships between particular variables that are implied by 
the constraints. Projecting onto a single variable gives 
the range of values it can take while still satisfying the 
constraints. Projecting a system of constraints onto a set 
of variables shows how these variables are inter-related in 
any solution to the constraints. 

In the example in Section 2, we saw how projecting the 
constraints on to variables R U ra> , Rury indicated that they 
were constrained to be on the line Rl urx — f Rlury = 0 
where 110 < Rlury < 150. 

Unfortunately, the doubly-exponential complexity of gen¬ 
eral algorithms for projection has prevented its use in 
many application domains. Recently, however, a new al¬ 
gorithm has been developed [9] that is very efficient when 
the number of variables in the projection space is small. 
This is exactly the case we are interested in, as we typi¬ 
cally project onto a small number of variables (typically 


one or two) corresponding to the object currently being 
manipulated. 

The new algorithm computes a projection by successive 
approximations using an on-line algorithm for convex hull 
construction in the projection space. It provides an ex¬ 
act solution when the size of the output is small, and an 
approximation (upper or lower) when the size of the out¬ 
put is unmanageable. Previous methods usually failed 
to produce any output, even in small cases, because of 
the enormous amount of intermediate computation. Ini¬ 
tial testing has shown extremely good performance, es¬ 
pecially for small projection spaces. 

4 Proposed Architecture 

In this section we propose an architecture for a constraint 
manipulation sub-system within an interactive constraint- 
based user-interface system. The architecture exploits 
the technology described in the previous section. This 
architecture must address issues of incrementality, inter¬ 
action latency and feedback bandwidth [11]. These issues 
are most critical when anchor constraints and the values 
of the parameters of objects are changed during manip¬ 
ulation. 

To address these issues, we use a two level architecture 
which maintains sets of constraints in canonical form. 
The first, the free canonical /orm(FCF), is a canonical 
form of the local and global constraints. The second, the 
anchored canonical form{ ACF), is a canonical form for 
the entire set of local, global and anchor constraints. In 
addition, local constraints associated with primitive and 
compound objects are kept in canonical form. 

The FCF does not change often, only when local or global 
constraints are added or deleted. However, whenever an 
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Tl cs — -Buts 4" Bgy 4* M. - 2M y 

TIm® — — Tl*r» 4 + 2 Bey *f 2M a — 4 My 

— ri Hras 

— 2Tlur* 2B V | S — 2 B C y — 2M X 4" 

Tl ule = —T1 urn 4* 2 B u j b + 2B <y 4 2Af B — 4My 
T2 caj = B u | x + 3B, V 4- 2M„ - 6M V 
T2 Mb = —T2 urx 4- 2B u t a 4- 6B. V 4- 4M X - 12M V 
— T2 urB 

T 2« = 2T2 14TB - 2B U | X - 6B. V - 4M X 4- 12M V 

^2 tt | x — —T2 urB 4" 2S tt ix 4* 6B C y 4" 4M e — 12Af y 

Bible — B u j e 4* M x 

Bl c » = B v ix 4" B*y 4* M x — 2M y 

Blfcr® = B u jx 4" 2B € y 4" M E — 4M y 

Blti-rx — Bui» 4” 2B <y 4* — 4M y 

Bl cx = 2B 4y - 4M y 

B2fety — Bttly “ Bgy 4” My 

B2gy — B-uJy 0.5B O y 

B2«|y — B u |y — My 

B2»ify — Bitty — My 

Bbt x = B u ix 

Bbre — Buix 4” 4B ey 4” 3M* — 8My 
B»ra — Buix + 4B,y + 3M S - 8Af v 
Bl u tx — Bitis 4* -M* 

B2 u t* — B u ix 4" 2B*y 4" 2A/ X — 4M y 

B2gy — B C y — 2M y 
B«» = 4B. y 4- 3M X - 8My 

My 4“ Tl ttr y Bitty ^ 0 

— B«y — 2Tl ttr y 4" 2Biti y ^ 0 

My — Bitty 4* ^*2 U ».y < 0 
—■Bgy 4" 2Bitiy — 2T2itry £ 0 

— M x <0 


Tlfy — Bitty “ O.SBgy 

rifely — —rixtry 4" 2B u ty — Bgy 

ru,y — Tlttry 

ribt-y — —T*litry 4" 2Buly — Bgy 
Tlgy = 2Tl,* r y - 2Bitty 4- Bgy 
T2cy — Bitty - 0.5B.y 

T2feiy — —T 2 U ry 4" 2Bitjy — B <y 

T2 uI y — T2 ury 

2 1 2fery — — T2itry 4" 2B u |y B cy 
r2*y — 27'2itry — 2B u ty 4" Bgy 
Blfejy — B u ly Bgy 4" <My 
Bley — Bitty — 0.5B ey 
Blitty — Bitty My 
Bl*ry — Bitty My 

B2fei® — B u tx 4* 2B*y 4* 2M X 4My 

B2 ex — Bitt* 4* 3Bgy 4" 2M X — 6 Jl^ y 

B2fe rx — Bitis 4" 4Bgy 4* 2.M® — 8M y 
B2 ttr x = Bitix 4“ 4Bgy 4" 2M* — 8My 
B2.X = 2B.y - 4My 

Bfety — Bitty ~ Bgy 
Bfery ~ B u ty — Bgy 
Bitry “ Bitty 
Blgy — Bgy 2bdy 

Blfery — B u ly — Bgy 4* My 
B2fe r y — B u ty Bgy 4* My 


— 27*1 itrx 4- 2B u tx 4- 2B.y 4- 2M* - 4M V < 0 
~B V t* - 4B, y - 2M* 4- 8M y + T2 ur * < 0 
2Bitix 4- 6B*y 4- 4M X - 12M V - 2T2 urx < 0 

T1 ur * - Buts - 2Bgy - Ms 4- 4My < 0 
-My < 0 


Figure 6: Canonical Form of Original Set - 61 constraints 


anchor constraint is changed, or the picture is manip¬ 
ulated via a new object, a new ACF is incrementally 
computed from the current FCF. Obviously - it is better 
not to redo the work already done in finding redundancy 
and implicit equalities. The role of the ACF is to be a 
parametric solved form with respect to the variables in 
the object currently being manipulated. This allows ef¬ 
ficient updating of the display when the parameters are 
changed. 

When objects or global constraints are added to the sys¬ 
tem new canonical forms are computed from the old one. 
Firstly, an FCF for the new local and global constraints 
is computed incrementally from the old FCF. Then, the 
ACF for the entire system of constraints is computed 
incrementally from the new FCF by adding the anchor 
constraints. This is quite efficient because the anchor 
constraints are always equations. Computation of these 
canonical forms will reveal if the system is unsatisfiable, 
and if so which constraints are at fault. 

Deletion of objects or global constraints is more problem¬ 
atic. However, because local constraints associated with 
each object are kept in canonical form, then a new FCF 
and subsequently a new ACF can be computed relatively 
quickly. Thus the architecture provides: 

• Incremental addition and deletion of constraints. 

• Detection of unsatisfiability and identification of 


which constraints are the cause. 

When an object is selected for manipulation, the ACF is 
examined to see if it is in parametric solved form with 
respect to the variables in the object. There are three 
cases to consider: the ACF is under-constrained, over¬ 
constrained, or it is in parametric solved form. 

Examination of the ACF will reveal the system is over¬ 
constrained if there are no degrees of freedom, that is all 
variables are uniquely determined. For instance, consider 
the anchor constraints in Figure 2. These constraints, as 
follows, 

Btlx = Buy = 0 Burs — 400 Bury = 200 

T\< x = 30 Tl ey = 10 T2 ex = 40 

T2 ey = 20 R2ir X = 300 R2 lry = 87.5 

will fix the width and height of the boxes containing 
text. Before adding these anchor constraints, the FCF 
is that shown in Figure 6. When the anchor constraints 
are added, we derive from the FCF, the following very 
simple ACF with 60 equations that assign each variable 
to a constant: 

Tl cy = 100, Tibia = 110, n Wy = 95, ... 

As every variable is uniquely determined, there are no 
remaining degrees of freedom. Thus, with these anchor 
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Tie. = 200- %Rl*r V 

Tlfel* = 185 - ZRUry 
Tib.® = 215- |m ury 
Tl^l® = 185 - f Jilury 
Tl„ = 30 

T2 cm — 200 + y -Rlxvry 
T2 m« = 180+ $RUry 
T2i>re = 220+ ■jiilii.ry 
T2 u Ie — 180+ ^Al ur y 
T2„ = 40 

•flic* — 200 — ^jRl U t*y 
i?lbJ. — 400 — jAl^ry 
^Hr# — J'-Rlury 
J*1 U |« = 400- §JU ury 
Bl«* = -400+ 4fll ury 
H2bi. — 400 — |-i21 ury 
R2 CS — 200— ^i21 ttr y 

R^brs — TjRlttfy 

R2 u rr — jfilury 
A2 (S = —400 + 4Bl ury 
Bbis = 0 
B h ras = 400 
B ttre = 400 

£ u j. — 0 

B,. = 400 
Tl^r* — 400 — 

T2 ur * — 220— jAl^ry 
Af x =400- §Bl ttry 

— J^lvry 
110 < Blury 


Tl cy = 100 

Tl W y = 95 

Tl^ry = 105 

Tlfcry = 95 

T l« y = 10 

T2 cy = 100 

T2 bIy = 90 

T2 tl(y = 110 

T2 b ry = 90 

T2« y = 20 

Bl cy = 100 

B1m v = 200 - Hl ttry 

Blbry = 200 - i?l ury 

Bl u i y = -400 + 4HU ry 
i?l*y — —200+ 2Blnyy 
i22 b i y = 200 — J21 ury 
B2 cy = 100 

R^bry — 200 — -Rl-ury 
R2-ury = i?lury 
R2 my = 200- Bl ury 

Bbty = 0 
Bbry = 0 
-Bury = 200 
B u ly = 200 
B« y = 200 
Tl ury = 105 
T2 ury = 110 

M v — 200 — R. 1 v r y 


Hlury < 150 


Figure 7: Paremetric Solved form in Ru ry - 61 con¬ 
straints 


constraints the system is over-constrained - it only has 
one solution. Note that this was certainly not obvious in 
the original set of constraints in non-canonical form. 

At this point we need to determine which anchor con¬ 
straints need to be removed. To do this, a set of in¬ 
equations are added corresponding to the variables in 
the object that we wish to manipulate. This of course 
results in an unsatisfiable system of constraints, but, in 
computing the new FCF, the unsatisfiable system can be 
analyzed to find other constraints, other than those we 
just added, which cause the unsatisfiability. These can 
be then removed by the user. 

If we consider only two anchor constraints instead of 
three, as in Figure 3, the ACF, as shown in Figure 7, is 
in parametric solved form having parameter Rl ur y This 
solved form consists of 59 equations and 2 inequalities, 

110 < RUry Rlury < 150 (1) 

and clearly has only one degree of freedom. This solved 
form can now be compiled into code to update the de¬ 
pendent variables whenever this parameter is changed. 

If the ACF is not in solved form, then the constraint sys¬ 
tem is under-constrained. From the ACF it is straightfor¬ 
ward to determine possible choices for additional anchor 


constraints. These are variables which are not uniquely 
determined by the desired parametric variables of the 
ACF. 

Thus the architecture provides: 

• Rapid re-satisfaction of existing constraints when 
a small number of parameters, such as location of 
a vertex, are changed. 

• Detection of under-constrained system and identi¬ 
fication of which parameters can be fixed to con¬ 
strain it. 

• Recognition of an over-constrained system and iden¬ 
tification of anchor constraints responsible for the 
unsatisfiability. 

When an object is selected for manipulation, the ACF 
is projected onto the object’s variables giving the ranges 
of values that they may take while still satisfying the 
constraints. As the constraints are linear, the range will 
always be a convex polygon allowing it to be easily dis¬ 
played. In the example, we saw that the upper right 
corner of iEl was constrained to move only on a diago¬ 
nal line. In other situations it might be constrained to 
move inside a small region. In both cases it is possible to 
present these regions graphically to the user. 

Thus the architecture provides: 

• Computation of the range of values that a param¬ 
eter can take and still leave the system satisfiable. 

Interactive editors must support the definition and com¬ 
pilation of compound objects. Although this is straight¬ 
forward in systems without constraints, the addition of 
constraints raises new issues. In particular, efficiently 
representing the constraints in the compound object, and 
determining which variables of a compound graphic ob¬ 
ject define the objects it contains. 

An efficient representation for the constraints of a com¬ 
pound object can be obtained by computing the canoni¬ 
cal form for all the constraints inside it. 

Defining a compound object is more problematic. Intu¬ 
itively, the user should be able to select a group of ob¬ 
jects on the display, and then the editor should abstract 
the desired definition from this instance of the definition. 
Essentially this can be done as follows. Once the user 
has selected a set of objects to form the compound ob¬ 
ject, selected anchor constraints must then be removed 
to permit the compound object to be located freely, and 
finally a set of definitional variables which ensure that all 
its internal objects are well-defined - none of their vari¬ 
ables are left unspecified - must be selected. The last 
step can be done by ensuring that the constraints in the 
compound object are parametric solved form for these 
definitional variables. 
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Projection can be used to simplify the constraints associ¬ 
ated with the definition, acting as a type of compilation 
or partial evaluation. The idea is to project the con¬ 
straints onto the definitional variables of the new object: 
this will remove local or internal variables from the con¬ 
straints, and so simplify them. 

For example suppose we were constructing a square com¬ 
pound object from the rectangle definition given earlier. 
The square object has the additional constraints that the 
x and y extents are equal. If we allow the square to be de¬ 
fined in terms of a diagonal and a point, then the square’s 
local constraints are simpler than those of the rectangle. 

Thus the architecture supports: 

• The definition and compilation of compound ob¬ 
jects. 

5 Empirical results 

We now present empirical results concerning the perfor¬ 
mance of this constraint technology. We consider the fol¬ 
lowing operations which are typical of those which occur 
during an interactive session: 

1. Testing the solvability of a set of constraints and 
computing their canonical form. 

2. Adding new constraints or anchor points, and com¬ 
puting a new canonical form. 

3. Generating a parametric solved form in terms of a 
set of specific variables that correspond to a point 
which is being dragged. 

All these operations are handled by a new incremental 
constraint solving system we are developing. This system 
integrates algorithms for: testing solvability; computing 
the canonical form; performing Gauss-Jordan reduction 
to obtain parametric solved forms; and performing pro¬ 
jections. This system is implemented in C++ on an IBM 
Rise System/6000, model 530 running ADC. Run times 
are measured in virtual CPU seconds. 

In this evaluation, we use three sets of constraints as 
test data and time the operations described above. The 
results are shown in Figure 8. 

The first test corresponds to the well known example 
of recursively nested quadrilaterals (see Figure 9). The 
initial set of 76 constraints consist of constraints that 
ensure that the endpoints of lines in the quadrilaterals 
touch, that the midpoints of the quadrilaterals form a 
parallelogram, and that the vertices of the embedded 
quadrilaterals are at the midpoints of the edges of the 
enclosing quadrilateral. It takes 0.06 seconds to trans¬ 
form this system into its canonical form which contains 
64 constraints. (Remark: the twelve constraints that are 
eliminated are exactly those that constrain the midpoints 



of each quadrilateral’s edges to form a parallelogram. In 
retrospect, this was to be expected. The geometry theo¬ 
rem that this example illustrates states this fact!) To this 
system, we then add the three anchor points, consisting 
of 6 equality constraints, shown in Fig. 9. It takes 0.11 
seconds to derive a new canonical form. Lastly, the time 
taken to obtain the parametric solved form in terms of 
the unanchored vertex of the quadrilateral is below the 
resolution of our timing system (1/100th of a second). 

The second test corresponds to that given in section 3. 
The initial set of constraints (see Figure 5) consists of 82 
constraints (54 equalities, 28 inequalities) over 60 vari¬ 
ables. It takes 0.80 seconds to convert this to its canoni¬ 
cal form (Fig. 6) containing 65 constraints. We then add 
2 anchor points, consisting of 4 equalities. It takes 0.17 
seconds to compute a new canonical form. Finally, the 
parametric solved form in terms of the point 
(Rlurx, Rlury) takes 0.01 seconds to compute. 

The third test, although from outside the graphics do¬ 
main, has similar characteristics (sparsity, relative per¬ 
centage of equalities and inequalities, etc..) as the pre¬ 
vious examples. It is of interest because the initial con¬ 
straints are all inequalities. The initial set of constraints 
consist of 1819 inequalities over 68 variables. This is 
simplified into 58 equalities plus 90 inequalities over 10 
variables. Then, 10 new equalities are added and the set 
is updated accordingly. Finally, 10 variables are chosen 
to obtain a parameterized representation. 

In these examples, the advantage of keeping constraints 
in canonical form is demonstrated by how little time it 
takes to add constraints, and to compute the parametric 
solved forms. The most time consuming operation is the 
detection of equalities implicitly defined by inequalities in 
the original system. This can be significant when there 
are large number of inequalities. However because the 
system is incremental, it will rarely, if ever, have to deal 
with the size and complexity of dataset 3 in an interactive 
application. Adding large numbers of inequalities at once 
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Total Constraints 

N (=,<) 

Computing Canonical Form 
Constraints (=,<) (sec) 

Addition 

(sec) 

Solved Form 
(sec) 

Recursive Quads 

76 (76,0) : 72 var. 

64 (64,0) 0.06 

0.11 

<0.01 

Layout: 

82 (54,28) : 60 var. 

65 (51,14) 0.80 

0.17 

0.01 

Dataset 3: 

1819 (0,1819) : 68 var. 

148 (58,90) 17.96 

0.28 

0.01 


Figure 8: 


is unlikely. 

6 Conclusion 

We have shown how recent results and algorithms to ma¬ 
nipulate linear arithmetic constraints provide a technol¬ 
ogy for interactive constraint-based user-interface sys¬ 
tems. This extends previous constraint technology for 
user interfaces, by allowing simultaneous linear equations 
and inequalities, and by providing techniques which give 
improved user-feedback. 

We have proposed an architecture based on this technol¬ 
ogy and are currently implementing a interactive constraint- 
based editor based upon it. 
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ABSTRACT 

A new radiosity model is presented in which all patches 
are represented as isoparametric elements and the patch lumi¬ 
nances change bilinearly. The surfaces are tessellated into pla¬ 
nar quadrilaterals and continuous surface luminance is main¬ 
tained where patches meet. A new form factor, accounting for 
the luminance contributions between patches, is derived and 
calculated using hemisphere projections and Gaussian quadra¬ 
ture. Images generated from the new approach were tested by 
pixel-level comparison with real images acquired by a cali¬ 
brated imaging system, and compared with the images gener¬ 
ated by the uniform patch luminance radiosity. The compari¬ 
son results indicate that fewer bilinear patches are required to 
achieve comparable luminance accuracy. 


The interreflection problem is equivalent to the radiosity 
method as used in thermal engineering studies of radiative heat 
transfer! 10]. Surface luminance is a function of the light from 
sources and the interreflected light from other diffuse surfaces. 
The key concepts are energy conservation (all light leaving a 
surface must be accounted for) and energy equilibrium (the 
total light out of any surface equals the total light in times 
the surface reflectivity). In the radiosity calculation, surfaces 
in a scene are divided into planar patches. The luminance at 
any point on a patch due to the light emitted by another patch 
can be calculated using the analytical methods summarized in 
[12, 14, 17]. The luminance can be expressed as: 

r f r cos 0 k cos 9i j „ x 

L = p L s -- da, (1) 

A* 


CR Categories and Subject Descriptors: 1.3.3 [Computer 
Graphics]: Picture/Image Generation; Display Algorithm; 
1.3.7 [Computer Graphics]: three-dimensional Graphics and 
Realism. 

General Terms: Algorithms. 

Additional Key Words and Phrase: radiosity, accurate im¬ 
age, non-uniform patch luminance, pixel level comparison, 
image registration. 


1. INTRODUCTION 

Radiosity methods for photo-realistic image generation 
have recently been the subject of intensive research interest 
in computer graphics. These methods involve the accurate 
determination of surface luminances before the pixel image is 
rendered. Since each diffuse surface receives light from and 
emits light to other surfaces visible to it, this interreflection 
problem can be modelled as a set of equations relating the 
luminance of each surface to the luminances of all other 
surfaces in the scene. 


where p is the surface reflectance, L s is the source luminance, 
r is the distance between the point and the source, Ok is the 
angle between r and the normal of the point, Oi is the angle 
between r and the normal of the source, and the da, is the 
differential area on the source. 

Current radiosity methods assume that the luminance 
is uniform on each patch[9]. Thus, if we use only average 
luminance values for each patch in a set of M patches, then 
Lk . the luminance of patch k, is a linear function of the 
luminances of the other patches, plus the self-illumination 
term L ° k : 


M 

Lk = pk^2FkiLi -f L° k 

i =i 


( 2 ) 


where p k is the reflectance of patch k, and the form factor 

Fkh given b y 



cosOkCosOi 



da\da k 


is a function of the relative positions and orientations of the 
two patches. Ak is the area of patch £, dai and dak are, 
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respectively, the differential area on patch / and k. For an 
enclosed scene, additional information can be obtained by 
using the following three principles. Energy conservation 
requires that the energy leaving a surface must equal the 
energy into that surface times the surface reflectivity. That 
is. 



for k = 1,..., M, where A k is the area of patch k. Reci¬ 
procity (the Second Law of Thermodynamics) requires that: 

A k F kl = AiFi k 

and the principle that no planar polygon may illuminate itself 
can be written as: 

Fkk = 0 


where /(x a , j/ 2 ), f x (x i, y t ) t and f y {x i, yi) are, respectively, 
the value and partial derivatives of the function at the point 
( x \, y\ ) and f xy (xi, y \) is the mixed second derivative. Stan¬ 
dard radiosity approaches assume that the luminance is con¬ 
stant, therefore they use only the zero order term f(x \, y\) to 
describe the surface luminance of a patch, that is, /(x, y) = 
f(x i i yi) for a given polygonal patch. To model non-uniform 
luminances, at least the first order or linear terms need be used. 
A continuous luminance model and quadrilateral tessellation 
model based upon the crossterm Taylor series expansion is 
used in our discussion. 

The interior luminance, in a quadrilateral patch, may be 
expressed as a bilinear interpolation of the luminances at the 
four vertices Lj(j = 0,1, 2,3) : 

J=0 


A number of algorithms [1, 2, 5, 6, 7, 8, 13, 16] have 
been developed for computing form factors and for solving 
the interreflection problem. All assume patches of uniform 
intensity. This assumption simplifies the calculation of form 
factors, but it can in many cases require that a scene be 
subdivided into a number of patches. 

In this paper we develop a new radiosity model which 
allows the luminance of surface patches to vary bilinearly 
across their patches. We develop an effective approach to 
calculate form factors for such patches by using hemispher¬ 
ical projections and Gaussian quadrature in situations when 
analytic solutions do not exist. We then undertake a system¬ 
atic comparison of images produced with bilinear luminance 
patches and uniform luminance patches to determine the ef¬ 
fectiveness of our model in increasing accuracy with a smaller 
number of patches. This is done by comparing these images 
against a calibrated real would test scene. 

The paper is organized as follows. Section 2 presents oyr 
radiosity model and develops a form factor function from a 
bilinearly illuminated quadrilateral patch to a receiver vertex. 
Section 3 shows how to compute such form factors using 
hemisphereical projection and Gaussian quadrature. Section 4 
contains the experimental results of our accuracy tests. 

2. NEW RADIOSITY APPROACH 


where the Nj , called shape functions, depend only upon the 
shape of the quadrilateral. Their explicit forms are given as 
follows: 

tfo = i(l-3)(l-t) AT 1 =i(l+ S )(l-0 

N* = i(l + s)(l + 0 N 3 = i(l -s)(l +<) 

where s and t are parameters, satisfying |s| < 1, |t| < 1. 
Shape functions satisfy the following rules: 1) The mapping 
between the a plane in the xyz space and the st space is 
homeomorphic. 2) Each shape function has value one at its 
own vertex and zero at others. 3) Each shape function is 
zero along any edge that does not contain its vertex. 4) Each 
shape function is a polynomial of the same degree as the 
interpolation equation. 

In the simulation, triangles and quadrilaterals were se¬ 
lected as primitives. It is easy to show that the triangle is 
basic; any planar polygon can be exactly decomposed into 
triangles. Since triangles are degenerated quadrilaterals with 
two vertices coinciding, all the patches will be represented as 
quadrilaterals. 

Now consider the interreflection between two diffuse 
patches with luminance described by Eq,(l). From Eq.(4) 
, the luminance contribution of patch / to the tth vertex of 
patch fc, (see Fig.l), is: 


The basic question addressed in this section is as follows: 
Given that a patch has non-uniform luminance, what mathe¬ 
matical model of that luminance will be sufficiently accurate, 
yet still allow for practical calculation of interreflection form 
factors? Any surface luminance distribution which is an ana¬ 
lytic function f(x,y) of the coordinates has a Taylor’s series 
expansion: 


ii = Pk f t K A - « i Ki, L, 

£3 ,rr % 


where p k is the diffuse reflectivity of patch k t L 3 is the 
luminance of the jfth vertex of patch /, da\ is a differential 
area on patch and 


/(*.») = /(*i,yi)+/*(*i.yi)(*-*i)+/t/(*i.yi)(y-yi) 

+/*,{*!. yi)(* - *i)(y - yi) + • • • 


K> } =Jn j 

A l 


cos Bj cos 9 


J-dai 


(5) 
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is the new form factor corresponding to the bilinear patch 
luminance distribution. 


L3 



Fig.l Luminance contribution from one 
patch to one vertex of another patch 


From Eq.(6), the total flux out of the patch k is the 
summation of the luminances received by the rest of the 
patches: 

= Lidai 

la 1 


N r 3 

= f Yi K > iLi 

1=1 JA \ i =0 
3 

= r ^ ^ Li Bi 


da i 


where: 


4, = / N % 

JA k 


da k 


* = £/ / N, C ° 36iC T 6> daidda k 


| = 0 J A * ,fA t 
Energy conservation requires that: 


p k $in = $out 


that is 


Therefore, the total luminance from patch l to the vertex 
i of patch A: is a linear combination of the vertex luminances of 
patch 1, and vice versa. For an enclosed scene with M patches, 
there will be 4 M vertices. With self-luminance included, the 
luminance on each vertex can be written as 

N 

L, = p k J2 Ki } L, + L° (6) 

J = 1 

where iV < 4Af. 4 M is an upper bound; the actual number 
may be lower. Many of the vertices coincide where patches 
meet and the redundancies can be eliminated. For a large 
surface subdivided into K 2 patches, there are only (K + l) 2 
independent vertices. In Eq.(6), i is the vertex index of patch 
k and j is the vertex index of patch f, with k = (i mod 4) 
and l = (j mod 4). 

Before calculating the form factors, the physical correct¬ 
ness of the new model needs to be verified. That is, we need 
to see if this model follows the energy conservation and reci¬ 
procity laws. To maintain energy conservation, the energy 
leaving a surface must equal the energy arriving at this sur¬ 
face times the reflectivity. The luminance at a point on patch k 
equals the summation of all the contributions from the whole 
scene and the total flux into the patch is 


3 3 

j>-2> 

j=0 i=0 

Consider a special case when only the jth vertex has 
non-zero luminance. In that case, 

Aj = Bj for j = 0,1, 2,3. 

A x is the area integral of shape function Ni, because: 

3 

^ Ai = Area of patch k 

t=0 

This can be written in a form comparable to Eq.(3): 



Reciprocity can be written in a form comparable to : 


Lkda k 


=-/ 

« h k 

jt a 

= — / Y^NiLida k 


i=0 


Fh = Fjk 


where i and j satisfy the condition: i mod 4 = j mod 4. 
By virtue of this constraint, the shape functions involved in 
the form factors are the same. Therefore, it is guaranteed that 
the Fn are the same. The physical meaning of this equation 
is that the fraction of the flux leaving patch k and arriving at 
patch / equals to the fraction of the flux leaving patch / and 
arriving at patch fc, therefore the system is stable. 


Graphics Interface ’92 



313 


3. CALCULATION OF THE 
NEW FORM FACTOR 

In this section, hemisphere projection^, 4] is used for 
calculating the form factor of Eq.(5) for an arbitrary convex 
planar polygon. See Fig.(2) for an example of the hemi¬ 
spherical projection of a triangular patch. The edges of any 
diffusely-emitting planar polygon which illuminates the point 
at the center of the hemisphere will intersect the sphere in arcs 
which are portions of great circles. Those arcs orthogonally 
project into portions of great ellipses on the equatorial plane 
of the hemisphere, these portions of great ellipses form an 
elliptical polygon on the equatorial plane. The area of that 
elliptical polygon is proportional to the illuminance of the 
original planar polygon. What we present here is the basis for 
an algorithm to calculate the illuminance at an arbitrary point 
P due to the light emitted by a luminous planar triangular 
patch. 



Fig.2 Hemispherical projection of a triangular patch 


1) Construct a unit hemisphere with its center at the point 
P and its equatorial plane tangent to the surface containing P . 

2) Project the three vertices of the planar triangle on to 
the unit hemisphere using lines that pass through the center of 
the sphere. Given three points: A> B % and C, we have three 
unit vectors A, B, and C as following: 


A = 


PA 

\PA\ 


B = — c- — 

\p%\ \pd\ 


3) Every pair of points lies on a uniquely determined 
great circle; the intersection of the three great circles form 
a spherical triangle, which is the spherical projection of the 
planar triangle. Arbitrarily select two vectors A and B. The 
normal to the great circle containing A and B is: 


N = -*i(AxB) 


M 3 = Z x Ml = * 3 (Z x (Z x N)) 

where is the normalization factor for Mi, and note that 
|Mi| = 1, |M 3 | < 1. 

5) Using the standard formula for the area of an elliptical 
sector, the area of the elliptical triangle is computed. This 
is proportional to the illuminance produced by the planar 
triangle. 



And the form factor is now: 



Gaussian quadrature can be applied to the 2-D form fac¬ 
tor integral of Eq.(5) when the hemisphere projection has no 
analytic solution. The Gaussian quadrature method multiplies 
the function evaluated at a matrix of interpolation points with 
known weights for each point. The integral is approximated 
as a summation of these weighted values: 


/ 


N X 

f(s y t)dsdt — ££/(«. (8) 

ial jxsl 


The accuracy of the approximation depends upon the 
number of interpolation points, Ni and JV 3 . The differential 
area of a patch can be expressed as a vector which is the cross 
product of two vectors lying on the patch 

da = dl a x (fit 


where 

dl, = (z,x + y $ y + z 9 z)ds 
(fit as (x*x + y t y + z t z)dt 

with x, y, and z are the unit vectors along x, y , and z axis, 
x j and x t are the partial derivatives of x with respect to s and 
t, similarly for y tt z Bt and z t . Then with the interpolation 
method, an arbitrary 3-D planar quadrilateral in xyz space 
can be transformed to a 2-D square in the st spacefill, (see 
Fig.3), with sides of length of 2: 

* = 51 N i*i V = YL * = 51 N > z >- 

j=»0 ;a0 

With dt 7 = dsdt t the directional differential can be 
expressed as: 


where k\ is the normalization factor for N. 

4) The spherical triangle is projected on to the equatorial 
plane forming a planar elliptical triangle whose edges are arcs 
of uniquely determined great ellipses. The major axis vector 
Mi and minor axis vector M 3 are: 

Mi = Jk 3 (Z x N) 


da ss (S,x + S y y + S x z)d<r 


The form factor K aj can then be computed by Gaussian 
quadrature: 



Nj 


(N, • r)(N, , r) 

7TT* 4 


det\J\d<r 
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The form factor Ki } can then be computed by Gaussian 
quadrature: 

Kij = / Ni 

A.. ,rr4 

where r is the vector from vertex i on patch it to the dif¬ 
ferential area dai, det\J\ is the determinant of the Jacobian 
transformation matrix from xyz space to st space. Note here 
that J is only a 2 x 2 matrix; since the patch is planar there 
are only two independent variables among x t y, and z. It is a 
quantity which includes the weight due to shape functions and 
the space transformation. The sign of (N* • r) and (Nj • r) 
can be used to decide whether or not the illuminating patch 
k is facing the patch /. By now, the integral has been trans¬ 
formed to the at domain over a 2 x 2 square, and Eq.(8) can 
be applied directly for form factor calculation. 



Fig.3 Isoparametric transformation 
from xyz space to st space 


It is useful to describe a situation in which hemispher¬ 
ical projection leads to an analytic solution where numerical 
solution would not converge. This is the problem raised by 
considering the luminance on the vertices instead of the center 
of the patch. Consider the case in which two planar patches, 
not lying in the same plane, meet along a common edge, i.e., 
the point Pa on patch A coincides with point Pb on patch 
B. The luminance at the point Pa due to the patch B will 
involving all the points on the patch £, including Pb . Since 
the distance between Pa and Pb is zero, Gaussian quadrature 
will not give correct result While in the hemisphere projec¬ 
tion, P A is at the center of the hemisphere, and the patch Pb 
will be projected on to the hemisphere. Then the form factor 
can be calculated analytically by the process described in [4], 


Patch A 



Patch B 



Fig.4 Contribution from close 
points on a neighboring patch 


4. SIMULATIONS, 
EXPERIMENTS, AND 
COMPARISONS 

Visual assessment, the primary means for judging the 
realism of images, is inadequate for assessing simulation ac¬ 
curacy because the human eye is not capable of absolute lu¬ 
minance measurement. Our experiments provide a more ob¬ 
jective accuracy comparison in the following way. First, im¬ 
ages were simulated using the new radiosity model. Then 
corresponding real scene images were obtained and objective 
methods for comparing them with simulations were devel¬ 
oped. Lastly, the images simulated by uniform luminance 
and non-uniform luminance assumptions ware compared to 
give a validated measure of the new method. 



Fig.5 Vertices definition on the surfaces of a box 
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To test the new radiosity model, simulations of images 
of real objects is necessary. Scenes were simulated out of en¬ 
vironments made of boxes and pyramids. The reason for using 
such simple figures is that we wanted to be able to construct 
physical replicas that could be compared with synthesized im¬ 
ages. The luminances on the internal vertices of patches in the 
synthesized images were calculated to simulate the read scene 
luminances. Fig.5 shows the twenty four internal vertices of 
a box made of materials with diffuse reflectivities po through 
p 5 . Note that the front face has been removed and displayed 
to the right to allow looking inside the box. The simulated 
images are shown in Fig.6 ( upper left is a box, upper right is a 
box with two comers cut, lower left is the cube connected to 
a pyramid, and lower right is the cube connected to a pyramid 
with the top of the pyramid cut. 

To quantitatively evaluate the simulated images, a sim¬ 
ple experimental model (the inside of a box fabricated with 
diffuse cardboard with various reflectivities) was constructed 
and photographed. It is impractical to photograph the inside 
surfaces of a closed box so we used an analogous arrange¬ 
ment. Lights were carefully arranged to produce a uniform 
illuminance on a diffuse translucent surface with a hole in it 
A camera looked through the hole into the open side of the 
box. All of the box surfaces have different reflectivities. The 
dimensions and measured reflectivities of the box materials 
were used in generating the simulated image in Fig.7. The 
corresponding real digital image produced by this experiment 
is shown in Fig.8. 



Fig.8 Real image of the box corresponding to Fig.7 


Before images can be compared at the pixel level, they 
must be registered to compensate for the unavoidable dif¬ 
ferences between the real and simulated imaging conditions. 
That means that the image gray levels must be normalized and 
affine transformations must be applied to account for transla¬ 
tion and rotation of objects and for perspective errors caused 
by incorrect lens focal length or camera orientation. 

It may be necessary to apply some sort of low-pass filter¬ 
ing, such as Gaussian filtering, to the images to prevent alias¬ 
ing effects. This is often done when registering images! 151. 
The generated image was resampled to match the size of the 
real image to the sub-pixel range and a Mean Square Error 
Root was computed: 


MSE(m,n) 



+ + n ) ~ ■'/»(«• j'P 

NiN 3 


where m and n were the distances the simulated image was 
moved in the 2-D pixel space, f\ and /a are the pixel lu¬ 
minances of the simulated and real images, N\ and .V 3 are 
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image height and width in pixels. Homogeneous transforma¬ 
tions were applied to one of the images and the gray level 
scale factor 7 was adjusted until the MSE was minimized. 
This residual MSE is composed of three components: noise 
and other errors in the image capture system; round-off and 
other errors that are inherent in the resampling process; and, 
simulation inaccuracy. By measurements on the image cap¬ 
ture system and by calculations and studies of resampled test 
images, the first two sources of error have been quantified. 
The remaining error is the simulation accuracy error. This 
was computed for the real and simulated images of boxes. 

The big luminance difference on the side faces between 
the real and simulated images can be qualitatively explained as 
following. The major reason is that in the case of simulation, 
the distance from the source to the receptor ranges from 0 to 
2 , while for the real image, the range is from 2 to 4 since 
the box is open for the light source and the camera. The 
corresponding luminance ratio by [14] is: 


7(0) - 7(2) 
7(2)-7(4) 


36.55 


Due to the interreflection, the actual ratio will be smaller than 
this number, but the effect will be very significant The 
back face of the real image is not affected much. Then 
normalization is applied, and it raises the overall luminance 
of the real image. That is why the real image has much lower 
contrast than the simulated one. 



Fig.9 Image registration process 

The comparison process is shown in Fig.9. Resampling 
registers the sizes of the real and simulated images for MSE 
comparison. Homogeneous transformation places the two im¬ 
ages in corresponding positions and orientations. Modification 
is necessary because the simulation modeled a closed environ¬ 
ment but the experimental environment is an open one. Using 
hemisphere projection techniques this error can be calculated 
and compensation can be applied, producing a modified sim¬ 
ulated image (see Fig.10) with residual MSE of 4.2%. An 
error image of the real and modified simulation shows that 
most of the remaining error is due to misalignments along the 
edges between box surfaces. Some of that error is due to the 
difficulties in constructing a box to sub-pixel accuracy, so the 


measured MSE is probably an upper bound. The actual edge 
locations could be measured and used as inputs to the simu¬ 
lation if a better estimate is required. Though the comparison 
methods could be refined further, the current results show that 
the new radiosity method produces accurate simulations with 
accuracy near the noise limit of the image capture system. 



Fig.10 Modified simulation image 

MSE 
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(b) Best Match List: 


Fig. 11 Patch number comparison between 
new and current radiosity methods 

Fig.ll shows that the new radiosity method requires 
fewer patches to produce same accuracy as current radios¬ 
ity methods. In Fig.ll, SNCM means Subdivision Number 
of Current Method, with each surface divided into SNCM 2 
patches; SNNM means Subdivision Number of New Method 
with each surface divided into SNNM 2 patches. Fig. 11 (a) 
shows the MSE between the images by new method and those 
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by current methods, Fig. 11(b) shows the best match list be¬ 
tween SNCM and SNNM. In the worst case, the new approach 
uses one fourth of number of the patches required by current 
radiosity methods. 


5. CONCLUSIONS 

A new radiosity model with non-uniform and continu¬ 
ous patch luminance distribution has been presented. Hemi¬ 
spheric projection was used to calculate the form factor an¬ 
alytically where possible. Otherwise Gaussian quadrature is 
applied. Images for simple 3-D closed scenes were simulated 
with the new approach. The corresponding real image from a 
calibrated imaging system was compared, on a pixel-to-pixel 
basis, instead of the human visual assessment. MSE mea¬ 
surements were used to evaluate the simulated image and the 
comparison results are presented which demonstrate that very 
accurate simulations of a simple environment can be achieved 
using the model. Images generated by uniform patch lumi¬ 
nance radiosity method were compared with the images pro¬ 
duced by the non-uniform method, to determine the increase 
in the number of patches required to achieve a given accuracy 
using uniform luminance instead of non-uniform luminance, 
the new radiosity model was proved. 

The new radiosity model allows greater flexibility in 
tessellation and provides accurate interreflecdon calculations 
with fewer quadrilateral patches. The quadrilateral tessella¬ 
tion is desirable when curved surfaces are used, tessellation 
into Bezier patches is most easily accomplished using quadri¬ 
lateral patches. Also, the new luminance model naturally uses 
quadrilaterals because of the coordinate transformations that 
are performed. 

Since the luminance distribution is done as part of the 
interreflection calculation, the interreflection and shading cal¬ 
culations are performed simultaneously. Therefore the con¬ 
sistency of scene luminance simulation and digital image ren¬ 
dering has been kept This alleviates the need to use an ad 
hoc interpolation scheme during rendering. Although substan¬ 
tially reduced when compared to previous radiosity methods, 
Mach band effects remain at common edges where patches 
meet. This is due to the fact that although the luminances are 
continuous, the directional derivatives may not be. It appears 
to therefore be necessary to use higher order approximation to 
the surface luminance to totally eliminate Mach band effects. 
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Abstract 

This paper describes a two-pass implementation of a 
physics-based global lighting model. This latter uses a 
physics-based reflection model, spectral distribution of 
light powers, and does not make any assumption on the 
specular behavior of materials. The scenes are discretized 
into points instead of patches. Hence, any kind of surface 
can be used without having to break down it into small 
planar patches. A data structure, named visibility graph, 
is built to efficiently evaluate the visibility between the 
sample points of the scene. Even though the photomet¬ 
ric properties of surfaces (reflection, transmission, rough¬ 
ness, emitted powers...) are modified, this graph does not 
change, which makes it easy to produce very quickly sev¬ 
eral images. Methods for computing the Fresnel factor 
are given in appendix. 

Resume 

Cet article decrit une mise en oeuvre, en deux phases, 
d’un modele d’eclairement global derive de la physique. 
Ce modele utilise un modele physique de reflexion, des 
densites spectrales d’energie et ne fait aucune hypothese 
sur l’aspect speculaire des materiaux utilises. Les scenes 
sont discretisees en points et non en carreaux, ce qui per- 
met d’utiliser n’importe quel type de surface, sans avoir 
a les subdiviser en petits carreaux planaires. Une struc¬ 
ture de donnees, appelee graphe de visibilite, est intro- 
duite afin de determiner efficacement la visibilite entre 
les points echantillons de la scene. Ce graphe reste in¬ 
variant lorsque les proprietes photometriques des surfaces 
(reflexion, transmission, rugosite, energies emises par les 
sources) sont modifiees, ce qui permet de produire rapi- 
dement plusieurs images d’aspect different. Enfin, des 
methodes de calcul du facteur de Fresnel sont donnees en 
annexe. 


*CSTB, Eclairage et Colorimetrie, 11 rue Henri-Picherit, 
44300 NANTES Cedex 03, FRANCE. Tel: +33 40 37 20 00, 
Fax: +33 40 37 20 40 


1 Introduction 

Photosimulation consists in producing highly realistic im¬ 
ages. The realistic aspect of materials can be simulated 
only with the use of physics-based reflection and trans¬ 
mission models. Such models have been introduced in 
[11, 17, 22]. To accurately evaluate the illumination of 
synthesized scenes, a global model is required. The imple¬ 
mentation of this global model can be performed accord¬ 
ing to three approaches: one-pass methods [24, 21,25,19], 
two-pass methods [28, 30), or multi-pass methods [26, 8]. 
The one-pass methods perform all the illumination com¬ 
putations independently of the view point, allowing then 
a fast rendering of the same scene from different view 
points. However, these methods need a large amount of 
memory to store data. Another drawback is the alias¬ 
ing defects due to sharp variations of specular reflections 
and specular transmissions. To avoid these defects, a very 
dense sampling of the scene is indispensable, which would 
significantly increase the data to be stored. 

In the two-pass methods, the diffuse and specular compo¬ 
nents (from reflection or transmission) are computed sep¬ 
arately; the notion of form factors are then extended to 
account for the specular effects contributing to the global 
diffuse component. In our opinion, these methods seem 
efficient since they offer a good realism and a non pro¬ 
hibitive computing time. 

Even though the multi-pass methods are better suited to 
the rendering of caustic effects, they are very time con¬ 
suming since they involve several passes: Monte Carlo 
path tracing, light tracing, progressive refinement radios- 
ity... 

For the reasons quoted above, the global model described 
in this paper has been implemented according to a two- 
pass method. It uses a physics-based reflection model as 
well as a transmission model. In this model, the light pow¬ 
ers emitted, reflected or refracted are represented by their 
spectral distribution, the materials are characterized by 
their spectral reflectance (Fresnel factor) and their spec¬ 
tral transmittance as well as their microscopic roughness. 
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As suggested in [20], all the light powers are sampled at 
four wavelengths. The used reflection model is Cook’s 
and Torrance’s model [11]. Moreover, our global model 
does not make any assumption on the specular behavior 
of materials. 

As pointed out hereafter, in our method, the scene is dis¬ 
cretized into points instead of small patches, which avoids 
the breaking down of all surfaces into small patches. So, 
any kind of object can be used. 

This paper addresses the following subjects. First, the 
global lighting model is presented as well as the reflection 
and transmission models. Then, we describe the different 
processings involved by our implementation: discretiza¬ 
tion of the scene, expression of the discretized light en- 
ergy balance equation, and the two passes. In section 4, 
a data structure, named visibility graph , is described in 
details. It will be shown that this data structure reduces 
drastically the amount of time needed for computing the 
visibility between the sample points of the scene. Finally, 
some experimental results are given, and a comparison 
with other methods is made. Methods for computing the 
Fresnel factor are given in appendix. 

2 The Global Model 

2.1 System of Light Energy Balance 
Equations 

A global illumination model must take into account all 
the reflections and refractions within the scene. To de¬ 
scribe the mechanism of light transport we use the model 
introduced in [5, 6]. This model consists of a set of equa¬ 
tions which express (in terms of radiance since it is the 
quantity the eye is sensitive to) the radiance of a point Pi 
in the direction of Pj when illuminated by all the surfaces 
Sk (see figure 1): 

Li(P„P 3 ) = h(P„Pj)* (1) 

Lf (Pi) + I R ^ Pk ’ Pi ’ P >) L ^ P «> Pi)dS k 

k ^ S k 

where 

• Lx(Pi y Pj) is the radiance of Pi as seen from Pj 
(emitted light power per unit surface and unit pro¬ 
jected solid angle), 

• h(Pi y Pj) is the visibility function, 

• Lf(Pi) is the self-emitted radiance, 

• R x (Pk y P t[ Pj) = siRKP^Pi^^ + diRtW) is ei¬ 
ther the bidirectional reflectance [11], or the bidirec¬ 
tional transmittance [23], 

• Lx(PkyPi) is the radiance of Pk as seen from P,, 

• G(Pk, Pi) = is a purely geometrical term, 

• A is a given wavelength. 

The system made up of these equations is called system 
of light energy balance equations . Solving this system will 
provide the global radiance at each point of the scene. 



Figure 1: Geometry of light transport mechanism 

2.2 Reflection Model 

The used reflection model is the one proposed by Cook 
and Torrance [11]. With this model, the reflected light de¬ 
pends on the wavelength, the incidence angle, the rough¬ 
ness parameter, and the surface refractive index (this in¬ 
dex is a complex number for metallic materials). This 
model takes into account the polarization of the light, the 
roughness and the masking/shadowing of the materials. 
Let us briefly review this model. 

This model is expressed as: 

R = sR 3 -f* dRd with s 4- d = 1 

where Rd and R» are respectively the diffuse and specular 
components, d and s are the proportions of the incident 
light which give rise to the diffuse and specular compo¬ 
nents respectively. 

Rd is independent of the incident angle, and can be ap¬ 
proximated by ^ [11], where F(A,0) is the Fresnel 
factor for a normal incidence. 

R s accounts for the roughness as well as for the mask¬ 
ing/shadowing effects, and is expressed as: 

Rm „ 1 F{\ t 0).D.G 

47T COS0, COS Or ’ 

where jF(A, 9) is the Fresnel factor, Bi is the incidence an¬ 
gle (direction D%) t 0 r the reflection angle (direction D r ) 
and 9 equals half of the angle (Di,D r ). G is the mask¬ 
ing/shadowing function, and D models the roughness ef¬ 
fect. In our implementation, D is the Beckman function. 
The Fresnel factor is given by the Fresnel formula. In 
appendix, we show how this factor can be computed effi¬ 
ciently (even in case of metallic materials). 

2.3 Transmission Model 

So far, no physics-based transmission models have been 
proposed in the literature, but only an empirical one [14]. 
Rather than using an empirical transmission model, it 
is more realistic, for each material, to use transmittance 
values experimentally obtained with the help of a spec¬ 
trophotometer [12]. 
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Figure 2: Sampling a polygon into points 


However, if any transmission model exists, it can be easily 
integrated in our implementation. For this reason, at the 
present time, only ideal specular refraction is available in 
our method. In case of ideal specular refraction, R s is no 
more than 1 — F(A,0), and s = 1. 

3 The Method 

3.1 Discretizing the Scene Into Points 


surface containing it. These areas 6S are used to evaluate 
the solid angles between two sample points. This process 
is illustrated by figure 2. We preferred to consider the cen¬ 
ters of the subsurfaces as sample points, rather than the 
vertices, so as to avoid undesirable effects along the edges 
shared by two surfaces. Note that a better subdivision 
would be to add a second threshold for the differential 
solid angles between two samples as suggested in [15, 16]. 


3.2 The Discretized Light Energy Bal¬ 
ance Equation 

Since the scene is discretized into points, only the point- 
to-point light contributions have to be evaluated. In order 
to discretize equation (1), we exploit the fact that with 
each sample point of a surface S* is associated a surface 
area £5*. Equation (1) becomes then: 




hiJ 


N 

Lf + ^Jl k0 G ki L ki 6S k 
*=1 


( 2 ) 


3.1.1 Motivation 

Any method used to implement a global illumination 
model requires a discretization of the scene either into 
patches or points. In our method, we have chosen to dis¬ 
cretize the scene into points for the following reasons: 

• we have to evaluate the radiance for each point of 
the scene in one direction (radiance is directional), 

• to evaluate the radiance of a point, it is not necessary 
to consider all the points of the scene. Indeed, since 
the radiance is an integral, we can evaluate it by 
taking a certain number of samples of the variables 
of the integrand (Gauss or Monte Carlo methods), 

• visibility between two points can be easily computed 
by ray tracing, 

• extended form factors, in presence of non ideal spec¬ 
ular surfaces, can be easily evaluated by tracing rays 
from point to point, while their evaluation is very 
difficult when using patches, 

• several kinds of surface can be sampled into points: 
polygons, spheres, cylinders, cones, parametric sur¬ 
faces etc.. Subdivision into patches can then be 
avoided, which allows the use of different kinds of 
geometric models. 

3.1.2 The Discretization Method 

In our present implementation, the scene is made up of 
a collection of convex quadrilaterals. The discretization 
process consists in recursively subdividing each quadri¬ 
lateral in four subsurfaces. The recursion stops when the 
area of a subsurface is below a certain threshold fixed by 
the user. Once this subdivision has been accomplished, a 
sample point is placed at the center of each subsurface. 
With each sample point is associated the area 6S of the 


where N is the number of sample points in the environ¬ 
ment. Using this point sampling and separating the dif¬ 
fuse and specular reflections, we obtain: 

L, } = h, } [Lf + Lf + L\,) (3) 

N N 

Lf = ^2diRiG ki L ki 6S k = ^2>(I fet ) (4) 

k=\ k=\ 

n n 

Lij = ^ ^ sjRf-ijGkjLkjSSk = ^ ^ S(Lkt)i (5) 

fc=i *=i 

where Lf is the diffuse component of the reflected light, 
Lf the specular component and n is the number of sample 
points included in the specular reflection cone (or specular 
transmission cone) which bounds the specular component 
R a of R. The angle of such cones depends of the roughness 
of the materials. The more important the roughness, the 
larger the angle. Such cones are defined by an axis which 
is the perfect reflection (or transmission) direction, and by 
an angle which depends on the physical properties of the 
materials (see figure 3). In our case, the reflection cone 
bounds the Beckman function D } since this latter models 
the roughness. In case of perfectly specular materials, the 
reflection cone is reduced to the perfect reflection direc¬ 
tion. As no sample point will lie along this direction, we 
select the sample point which is closest to this direction. 
Note that the operators V and $ are the global diffuse 
and global specular operators, respectively. 

Since transmission is treated exactly analogously to re¬ 
flection, we will omit it for now. 

The diffuse component is evaluated by adding the contri¬ 
bution of all points in the scene to a given point, while 
the specular component takes into account only the points 
whose contribution will be significant, i.e. the points in¬ 
cluded in the reflection cone. 
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Figure 3: the reflection cone This system can be solved (for each wavelength) by using 

the same methods as for radiosity (complete or iterative 
solution [9]). 

3,3 First Pass 

3.3.1 Extended Form Factors 3.3.2 Algorithm 


Since a reflected light can be separated into diffuse and 
specular components [5], the two-pass method consists of 
two main steps: the first one is a view independent com¬ 
putation of the Global Diffuse Radiance while the second 
evaluates the view dependent Global Specular Component 
Let us rewrite equation (1): 


Lij — hij 


N 


Lf + Y^( d i R i + siRLj)G ki L ki 6S k 


= hi 


N 


Lf + J2diR?G ki L k i6S k + Yi»i R iijGkiL ki 6S k 


k= 1 


fc=l 


= hi 


Lf + Y^V(L ki ) + S(Lu) 


*= 1 


The global diffuse radiance Lf of a point is then given by: 


L? 


N N 

= Lf + ^2 d>RfG ki L k< 6S k = Lf + Y J V{Lki). 

k =1 *=1 


( 6 ) 


The expression of Lf , as function of L d is then: 

N n n n 

Lf = Lf + £(I> + V(J2 5) + V(J2 S( £ ^)) + •••) 4 

J = 1 k= 1 1=1 m= 1 

The term [V + Z>(£ fc S) + Z>(£, S(£ m *» + is 
called Extended Form Factor and is named EFFji by 
analogy with radiosity [25, 6]. 

It is the proportion of diffuse light emitted by the surface 
Sj that contributes to the global diffuse radiance of the 
point Pi (see figure 4). Using these terms we obtain a 
system of N equations of N unknowns: 

Lf = Lf + Yl \ EFF > i * L 1] ( 7 ) 


The following algorithm computes the contribution of one 
given point to all its environment. This algorithm uses a 
shooting process since it allows a progressive solution. 

/* statements */ 

/* matrix of spectral extended form factors */ 
spectrum FF/*[#][#] 

/* N is the number of sample points */ 

/* spectrum initialized to 1.0 */ 
spectrum sped = {1.0, 1.0} 

/* computes the j th column of the extended form 
factors matrix */ 

EvaluateEFF(j) 

/* j: emitting point */ 

{ 

for all points Pi { 

/* geometrical term */ 
compute FFji = hj % G Jt 6Sj 
/* for each spectrum sample */ 

= EFF x [j][i\ + FFji * diRf, 

if ^ 0) { /* specular surface */ 

GlobSpecDpCj, j, i, sped, 1) 

} 

} 

} 

The procedure EvaluateEFF evaluates the direct contri¬ 
bution of the global diffuse radiance of Pj to the global 
diffuse radiance of all the points of the scene. All points 
being illuminated by Pj and belonging to a non perfectly 
diffuse surface emit the specularly reflected light to the 
environment. This is made possible thanks to the follow¬ 
ing procedure GlobSpecOp. 
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Voxel2 


/* Global Specular Operator*/ 

GlobSpecOp(j, 1, i, AEFFx, lg) 

/* j: emitting point */ 

/* i: last point met */ 

/* Is previous point in light path */ 

/* AEFFx: cumulated contribution */ 

/* of Pj to Pi via Pi */ 

/* lg: length of path from Pj to Pi */ 

{ 

/* light is emitted by Pi */ 

/* and reflected by P, towards points P**/ 

/* geometrical term */ 
compute FFu = 

for all points P* € reflection cone { 

/* geometrical term */ 
compute FFik — &»*G«k65< 

/* for each wavelength */ 

APFPa = AEFFx * FFu * siRfuk 
EFFx[j)[k) = EFF x [j][k] 

+ AEFFx * FFik * dkRxk 

if ((lg < lgmax) 

and(APPP A > EFFMINx ) 

and(s fc ^ 0)) { 

GlobSpecOpCj, i, k, AEFFx, lg+1) 

} 

} 

} 


3.4 Second Pass 

In the second pass (which is view dependent) the global 
specular operator is evaluated by means of a distributed 
ray tracing [10]. Note that this step does not entail the 
shooting of rays towards light sources (shadow rays), since 
the global diffuse component of the sample points (of the 
scene) is already available. Moreover, the intersection 
between a shot ray and the scene results in a point which 
may not be a sample point. However, the global diffuse 
component of the radiance at this intersection point can 
be interpolated by using the diffuse spectral radiances of 
the four closest sample points computed at the first pass 
as done in [2, 18]. 

4 Visibility Calculation 

This section shows how the visibility between two sample 
points is performed. To evaluate the visibility between 
a pair of sample points Pi and P Jt a ray is cast from Pi 
toward P y If this ray hits one object before reaching P Jt 
then the two sample points do not see each other. This 
process entails a large amount of ray-object intersections 
and needs to be accelerated. To this end, two data struc¬ 
tures are used: spatial subdivision and visibility graph. 



Voxel 1 and Voxel2 are fully visible 
Voxel2 and VoxeM are fully hidden 
Voxel 1 and VoxeM are partially hidden: 
the subgraph (pll l p31),(pll l p32) l 
(pl2 1 p31) l (pl2,p32) 

is stored 

Figure 5: Voxel-to-voxel visibility information 

4.1 Spatial Subdivision 

The parallelepipedic bounding volume of the scene is re¬ 
cursively subdivided by planes aligned with the coordi¬ 
nate system axes. Each slicing plane subdivides a space 
into two equal sized subspaces. This subdivision results, 
in a set of unequal sized subspaces which, from now, are 
called voxels. With each voxel is associated a list of sam¬ 
ple points located in this voxel. This recursive subdivi¬ 
sion stops either when the number of objects (which are 
convex quadrilaterals in our current implementation) in¬ 
tersecting the current voxel is below a certain threshold, 
or when the maximum level of subdivision is reached. 

To apply Amantides’s traversal algorithm [1], a spatial 
index SI is used. This spatial index is a 3 D grid, whose 
each element 5/[i, j, A:] is a pointer to a voxel (see [4] for 
more details). 

In contrast to a uniform grid, our spatial subdivision into 
unequal sized voxels allow to reduce the amount of mem¬ 
ory needed to store the visibility graph described here¬ 
after. 

Note that this spatial subdivision is also used in the sec¬ 
ond pass to calculate the global specular component by 
ray tracing. 

4.2 The Visibility Graph 

4.2.1 The Graph 

Since the scene is sampled into points, it seems worth¬ 
while to build a visibility graph giving a boolean visibility 
information for all pairs of points, instead of building a 
valuated graph whose complexity is 0(N Z ) for each wave¬ 
length as done by Buckalew [7]. 
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Voxel j 





Voxel i 


Figure 6: Visibility between two voxels 

This visibility graph is very attractive but requires a large 
memory. To cope with this problem of memory size, we 
propose the following visibility strategy. Since the scene 
is subdivided into voxels, we build a visibility graph giv¬ 
ing a visibility information between each pair of voxels 
instead of sample points. The visibility test checks how 
each voxel is visible from the other voxels. This test fires 
rays between the two voxels. Each ray corresponds to 
a pair of sample points belonging to different voxels as 
shown in figure 6. 

The data structure used to store the visibility graph is : 

Nedges = N(N+l)/2; 

/♦where N is the number of voxels^/ 
typedef struct /♦edge data structure^/ 

{ 

boolean visibility; 
object ♦buf-int; 
booleanmatrix ♦voxel-graph; 

}Edge; 

/♦visibility graph data structure^/ 

Edge Visibility-Graph[1..Nedges]; 

The nodes of the visibility graph are voxels, while its edges 
are elements of the linear array Visibility-Graphfl. The 
field voxel-graph is a pointer to a boolean matrix storing 
the visibility information between the points of the two 
voxels of an edge. 

Let i and j be the two voxels of edge k. Three cases can 
be considered: 

1. all points of voxel i are visible from all points of j . 
Visibility-Graphfk].visibility = true; 
Visibility-Graph[k].buf-int = null; 
Visibility-Graph[k].voxel-graph = null; 

2. the sample points of i do not see those of j. 
Visibility-Graph[k].visibility = false; 
Visibility-Graph[k].buf-int = null; 

Visibility-Graphfk].voxel-graph = null; 

3. the sample points of t see only a part of those of 


j (figure 5). In such a case we decide to store 
the visibility information between all pairs of points 
included in the two relevant voxels as well as the 
pointer to one object lying between these two vox¬ 
els. 

Vis ibility-Graphfk].visibility = false; 
Visibility-Graph[k] .buf-int ** 
pointer-object; 

Visibility-Graph[k].voxel-graph = 
pointer-matrix; 

Note that the field buf-intoi the data structure Edge plays 
an important role. Indeed, during the visibility test be¬ 
tween two voxels, as soon as a fired ray intersects objects 
lying between the two voxels, only the pointer to the clos¬ 
est object is stored in the field 6u/-mf of Edge. Due to the 
spatial coherence, the next fired ray has a great probabil¬ 
ity to intersect the same closest object between the two 
voxels. Consequently, this next fired ray will be checked 
for intersection with only this closest object, the pointer 
of which is already in buf-int, which saves a significant 
amount of computation. This approach seems to be a 
simplified version of the light buffer method [13]. 

In most scenes we have treated in our experiments, this 
strategy appeared rather efficient but we must keep in 
mind that the real memory complexity of this graph is 
always 0(N 2 ). 

4.2.2 Using the Visibility to Improve the 
Scene Discretization 

During the evaluation of the visibility graph, the distance 
between each pair of points is computed. If the distance 
between two points is small compared to their associated 
surface area 6S (important solid angle), the correspond¬ 
ing surface elements are locally subdivided. We obtain 
then new sample points with smaller associated surface 
areas. Thereby, the solid angles between these two points 
become smaller, which makes the computation more ac¬ 
curate. 

4.2.3 Other Advantages 

As said above, the visibility graph contains only purely 
geometric information, independent of the photometric 
properties of the objects. This allows to modify these 
properties (reflectance, transmittance, roughness, self- 
emittance...) while keeping the same visibility graph. 
Indeed, to obtain new values of radiance, only a graph 
traversal is needed. Moreover, when a few objects are 
moved, only a small part of the visibility graph has to 
be modified. This graph might be updated with a rapid 
incremental method. This method is currently under in¬ 
vestigation. 

5 Results 

We express the cost of the evaluation of the extended form 
factors matrix in term of numbers of calls to the visibility 
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number of voxels 

102 

number of empty voxels 

5 

number of pairs of points 

16076090 

number of pairs of visible voxels 

100 

number of pairs of hidden voxels 

1009 

number of pairs 
of partially hidden voxels 

4042 

number of pairs of points stored 

5 889 008 

graph computation time 

39mn 49sec. 


Table 1: Cost of the visibility graph 



With Graph 

Without Graph 

Time (seconds) 

48 mn 33sec 

3h 24mn 12sec 

Number of visibility 
computations 

8038045 

42 433 940 


Table 2: Computation of the matrix of extended form 
factors 

function hij. The cost of our method is given when the 
visibility graph is used, and when it is not. The results 
obtained are in favor of the use of this graph. 

Our test scene is made up of 134 polygons. The mate¬ 
rials of the objects of the scene are gold, sand, concrete, 
brown stone, blue and green enamel. All these materials 
are perfectly difFuse except gold (leg of the table) and sil¬ 
ver (mirror) whose parameters are: s = 0.9, d = 0.1 and 
the roughness coefficient m = 0.3 (Beckmann term) for 
gold, and s = 0.9, d = 0.1 and m = 0.05 for silver. The 
scene includes two primary light sources which are nor¬ 
malized D6500 white sources. The sampling of the scene 
results in 4010 points, 196 of which lie on specular sur¬ 
faces. To emphasize the influence of specular materials 
on the global diffuse radiance, our test scene was pro¬ 
cessed according to three different ways, giving the three 
following images: 

image 1 (figure 7): all the materials are assumed to be 
perfectly difFuse, 

image 2 (figure 8): scene containing difFuse and specular 
materials, image resulting from the first pass, 

image 3 (figure 9): scene containing difFuse and specular 
materials, final image obtained after the two passes. 

Note that the light spots near the leg of the table are the 
global difFuse component due to the specular properties 
of this golden leg. The spot light near the door is due to 
specular reflection on the silver mirror. 

The amount of memory required to store (table 1) the 
visibility graph has been drastically reduced thanks to our 
non uniform spatial subdivision, whereas this amount is 
very important for a spatial subdivision into a regular 3 D 
grid. 

Among the 8.038.045 visibility calculations to be per¬ 
formed, only 1.394.017 of them are actually made. In¬ 
deed, the visibility computations corresponding to the 
following cases are avoided: 


• sample points of the same surface; 

• the angle formed by the ray direction and the normal 
at a sample point is greater than 90 deg; 

• the buffer buf-int is used to evaluate the visibility 
function. 

6 Comparisons with Other Al¬ 
gorithms 

In contrast to our model, the global illumination algo¬ 
rithms described in [29, 28, 18] use an empirical reflection 
model, a trichromatic approximation, and an ideal spec¬ 
ular reflection. Even though the algorithm in [27] uses a 
new physics-based reflection model, it is limited to ideal 
reflection. As for the multi-pass ones [26, 8], they seem 
more suited for rendering caustics but are very time ex¬ 
pensive compared to the two-pass methods. 

7 Conclusion 

Unlike most of the models already implemented, our il¬ 
lumination model accounts for a physics-based reflection 
model, spectra instead of a trichromatic approximation, 
the spectral reflectance and transmittance of materials as 
well as color science. In our implementation, the scene is 
sampled into points instead of small patches. To prove 
that point sampling is correct, we have generated one im¬ 
age with that kind of sampling. It has been compared 
with the image of the same scene generated by a tech¬ 
nique based on a discretization into patches. The visual 
results seem very similar. This point discretization of¬ 
fers the advantage of evaluating, very easily, the extended 
form factors when no assumption is made on the specular 
behavior of materials. 

Even though the visibility graph requires an important 
memory size, it significantly reduces the synthesis time, 
and in addition, it is well suited to an adaptive point dis¬ 
cretization that improves the precision of the solid angle 
calculations, which avoids thus all artifacts. 
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APPENDIX 

Computing the Fresnel factor 

We have implemented two methods to efficiently evaluate 

F( M). 
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First method 

In several books [32, 3, 31], we can find, for several materi¬ 
als, Fresnel factor curves F( A, 0) for normal incidence, as 
well as the refraction index h for the wavelength A = 589 
(Sodium D lines) which corresponds to the center of the 
visible bandwith. Given these data, F(\, 9) can be ap¬ 
proximated [11], for each wavelength, by: 


F(\ it 9) = F{\ h 0) 




F(A,fl)-.F(A,0) 

F(A,f)-F(A,0)’ 


where F( A, 9) is given by the Fresnel formula for h. 


Second method 

In [31], for several materials, values of the refraction index 
are given for a certain number of wavelengths. In this 
case, F(A,0) can be exactly expressed with the Fresnel 
formula. 


Storage of jF(A, 9 ) 

Knowing the expression of F(\,9), we can precompute it 
for each sample wavelength and for different values of 9 
(20 seem enough). These values allow to create a look¬ 
up table, from which any F(X 1 9) can be computed by a 
simple linear interpolation. 
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Figure 9: Image 3: result of the two passes 
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